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Abstract — A  new  approach  is  introduced  to  estimating  object  sur¬ 
faces  in  three-dimensional  space  from  two  or  more  images.  A  surface 
of  interest  here  is  modeled  as  a  3-D  function  Known  up  to  the  values  of 
a  few  parameters.  Although  the  approach  will  work  with  any  param¬ 
eterization,  we  model  objects  as  patches  of  spheres,  cylinders,  planes, 
and  general  quadrics— primitive  objects.  Primitive  surface  estimation 
is  treated  as  the  general  problem  of  maximum  likelihood  parameter  es¬ 
timation  of  the  a  priori  unknown  primitive  surface  parameters  based 
on  two  or  more  functionally  related  data  sets.  In  our  case,  these  data 
sets  constitute  two  or  more  images  taken  by  cameras  at  different  lo¬ 
cations  and  orientations.  A  simple  geometric  explanation  is  given  for 
the  estimation  algorithm.  Although  various  techniques  can  be  used  to 
implement  this  nonlinear  estimation,  we  discuss  the  use  of  gradient 
descent.  Experiments  are  run  and  discussed.  Our  approach  includes 
the  commonly  used  stereo  approaches  as  special  cases.  The  Cramer- 
Rao  lower  bounds  are  derived  for  the  achievable  error  covariance  mat¬ 
rices  for  estimators  for  the  a  priori  unknown  parameters.  No  surface 
reconstruction  can  be  more  accurate  than  these  bounds.  The  depen¬ 
dence  of  the  bounds  on  object  surface  pattern  and  on  the  camera  and 
object  geometry  is  shown  explicitly.  An  interesting  result  arising  in  this 
work  is  that  maximum-likelihood  estimation  of  3-D  surfaces  also  re¬ 
quires  maximum  likelihood  estimation  of  the  pattern  on  the  object  sur¬ 
face.  Object  surface  segmentation  into  primitive  object  surfaces,  and 
primitive  object-type  recognition  are  readily  implemented  using  the 
probabilistic  framework  developed  in  this  paper.  The  attractiveness  of 
our  probabilistic  formulation  is  that  it  now  permits  a  fully  Bayesian 
approach  to  3-D  surface  estimation  based  on  images  taken  by  cameras 
in  two  or  more  positions.  For  example,  recent  follow-on  papers  include 
estimation  of  parameterized  surfaces  based  on  a  large  number  of  im¬ 
ages  taken  by  a  moving  camera  [II],  [27],  estimation  of  stochastic  sur¬ 
faces  based  on  images  taken  by  cameras  in  two  or  more  positions  [3], 
and  estimation  of  object  surfaces  given  contour  models  for  the  patterns 
on  the  surface  [28] . 

Index  Terms — Bayesian  parameter  estimation,  Cramer-Rao  bounds, 
estimation  accuracy,  maximum  likelihood  surface  estimation,  robot  vi¬ 
sion,  shape  from  motion,  stereo,  3-D  shape  from  multiple  images. 

I.  Introduction 

SSENTIALLY  all  3-D  object  surface  estimation  from 
multiple  views  to  date  is  based  on  either  active  stereo 
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using  a  laser  and  one  or  two  cameras  for  triangulation,  or 
on  passive  stereo  involving  matching  points  in  two  images 
and  using  triangulation,  or  on  optical  flow  [9],  [15],  [18], 
[24],  [10].  We  suggest  a  new  approach,  first  presented  in 
[1],  [2],  in  which  surfaces  of  complex  objects  are  approx¬ 
imated  by  a  small  number  of  parameterized  3-D  surface 
patches,  and  these  parameters  are  estimated  from  two  or 
more  images  taken  by  calibrated  cameras  from  different 
locations  and  directions.  Note  that  most  manufactured  3-D 
objects  of  interest  can  be  well  represented  by  a  small 
number  of  patches  of  planes,  spheres,  cylinders,  and 
cones,  and  many  natural  3-D  surfaces  are  also  well  rep¬ 
resented  by  these  surface  patches.  (In  theory,  our  ap¬ 
proach  applies  to  any  parameterized  surface.)  We  view 
the  problem  within  the  general  context  of  estimating  a 
priori  unknown  parameters  given  a  data  set  consisting  of 
two  or  more  functionally  related  subsets.  Estimation  ac¬ 
curacy  is  achieved  by  processing  data  in  large  blocks 
(large  patches  of  images  in  our  problem).  An  important 
contribution  of  this  paper  is  that  we  derive  the  expression 
for  the  joint  likelihood  of  two  images  taken  by  cameras 
in  different  positions.  This  likelihood  is  a  function  of  a 
priori  unknown  parameters,  namely,  the  parameters  spec¬ 
ifying  the  3-D  object  to  be  estimated.  We  then  treat  3-D 
object  surface  estimation  as  maximum  likelihood  estima¬ 
tion,  thus  using  estimators  having  very  desirable  proper¬ 
ties.  However,  of  greater  significance  is  that  being  able 
to  express  the  joint  likelihood  of  data  in  two  or  more  im¬ 
ages  as  a  function  of  the  unknown  parameters  to  be  esti¬ 
mated  opens  the  way  to  treating  the  estimation  of  3-D  ob¬ 
jects  based  on  these  images  entirely  within  a  Bayesian 
framework.  This  permits  the  ready  solution  of  many  prob¬ 
lems.  As  an  example,  we  derive  the  Cramer-Rao  lower 
bounds  on  the  error  covariance  matrix  for  the  estimation 
of  the  a  priori  unknown  parameters  characterizing  a  prim¬ 
itive  3-D  surface  patch.  No  surface  reconstruction  can  be 
more  accurate  than  these  bounds!  Other  extensions  that 
we  have  formulated  and  explored  based  on  Bayesian 
methods  are  the  maximum  a  posteriori  likelihood  esti¬ 
mation  of  surfaces  modeled  by  stochastic  processes  [3], 
and  the  maximum  likelihood  estimation  of  surfaces  based 
on  a  sequence  of  images  taken  by  a  moving  camera  but 
using  a  modest  amount  of  storage  and  computation  [11], 
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[27].  Our  approach  to  the  processing  of  a  sequence  of  im¬ 
ages  is  more  general  than  and  has  advantages  over  the  use 
of  the  Kalman  filter.  Since  we  use  the  joint  likelihood  for 
patches  of  two  or  more  images  as  a  function  of  the  a  priori 
unknown  3-D  surface  parameters  to  be  estimated,  other 
Bayesian  problems  are  easily  formulated  and  imple¬ 
mented.  For  example,  by  using  sizable  patches  of  images, 
the  form  of  the  dependence  of  the  likelihood  function  on 
the  parameters  to  be  estimated  becomes  simple,  and  min¬ 
imum  probability  of  error  recognizers  can  be  easily  im¬ 
plemented  for  recognizing  the  shape  model  for  the  patch 
of  surface  being  viewed,  i.e.,  for  deciding  whether  it  is  a 
plane,  a  sphere,  a  cylinder,  etc.  (see  [12]).  Being  able  to 
compute  the  joint  likelihood  of  patches  of  data  from  two 
or  more  images  also  permits  us  to  do  maximum  likelihood 
segmentation  of  a  complex  3-D  surface  into  primitive  sur¬ 
faces,  each  specified  by  a  single  model.  For  example,  the 
segmentation  can  be  into  smooth  surfaces,  i.e.,  those 
without  tangent  discontinuities  (see  [26]).  Our  approach 
assumes  the  use  of  calibrated  cameras.  If  the  camera  cal¬ 
ibration  is  unknown,  then  our  approach  can  still  be  used 
but  must  incorporate  the  a  priori  unknown  camera-model 
parameter-values  as  additional  parameters  to  be  esti¬ 
mated. 

Central  to  3-D  surface  estimation  from  two  (or  more) 
images  taken  from  cameras  in  different  locations  and  ori¬ 
entations  has  been  the  pairing  of  points  from  two  images 
that  are  images  of  the  same  point  on  a  3-D  surface.  This 
matching  of  points  in  two  images  is  usually  done  in  either 
of  two  ways.  1)  If  the  two  cameras  are  physically  close 
and  their  optical  axes  are  almost  parallel,  then  their  im¬ 
ages  will  differ  from  one  another  only  by  translation— one 
will  be  a  shifted  version  of  the  other.  Then  image  1  can 
be  partitioned  into  patches,  and  each  patch  cross-corre¬ 
lated  with  image  2  to  find  its  location  in  image  2.  Once 
this  correspondence  is  known,  the  location  of  the  surface 
region  in  3-D  space  seen  in  the  pair  of  corresponding  im¬ 
age  patches  can  be  determined  by  triangulation.  Since  the 
surface  region  seen  is  usually  curved,  one  would  like  the 
patches  to  be  small  in  order  to  locate  the  surface  region 
seen  accurately.  However,  if  the  images  are  noisy,  large 
surface  patches  must  be  used  in  order  to  overcome  the 
effects  of  the  noise,  thus  introducing  some  error  in  deter¬ 
mining  the  correspondences  between  pairs  of  points  that 
are  in  the  patch  interiors.  In  addition  to  this  correspon¬ 
dence  error,  there  is  always  some  error  in  camera  calibra¬ 
tion.  These  sources  of  error  can  result  in  appreciable 
triangulation  error  because  the  camera  optical  axes  are  al¬ 
most  parallel.  2)  An  alternative  approach  that  permits  a 
large  angle  between  the  camera  optical  axes  to  improve 
triangulation  accuracy  is  to  match  corresponding  small  lo¬ 
cal  features  in  the  two  images.  An  example  of  such  a  fea¬ 
ture  is  a  vertex  of  a  polyhedron  or  an  arc  in  a  boundary. 
The  difficulty  here  is  that  a  large  amount  of  pattern  rec¬ 
ognition  may  be  necessary  to  recognize  a  pair  of  corre¬ 
sponding  features  in  the  two  images. 

Among  a  few  interesting  new  approaches  to  stereo 
ranging  are  the  following.  Bolles  and  Baker  [5]  consider 
a  situation  in  which  a  camera  moves  in  a  straight  line  with 


the  direction  of  the  optical  axis  fixed  (in  the  experiments 
described),  and  successive  images  are  taken  very  close  to 
one  another  such  that  a  picture  differs  from  its  predecessor 
only  by  a  shift  of  one  or  two  pixels.  Then  the  images  are 
stacked  one  in  front  of  the  other  such  that  they  form  a 
three  dimensional  array  with  time  being  the  third  axis. 
The  result  is  that  a  point  on  the  object  surface  is  now  seen 
as  a  line  or  a  smooth  curve  in  this  three  dimensional  array, 
and  surface  point  estimation  is  accomplished  by  line  or 
curve  estimation  in  3-space.  Among  the  potential  draw¬ 
backs  of  this  approach  are  that  camera  motion  is  re¬ 
stricted,  data  must  be  processed  at  a  veiy  high  rate,  not 
much  information  has  been  presented  on  the  amount  of 
required  processing  for  estimating  a  suitable  number  of 
lines,  and  no  information  has  been  presented  on  the  ac¬ 
curacy  of  the  method.  Miller  [6]  developed  an  approach 
where  two  cameras  scan  a  scene  along  epipolar  lines  as¬ 
sociated  with  the  same  epipolar  plane.  His  problem  is  to 
estimate  the  disparity  between  the  two  images,  i.e.,  to 
match  up  points  on  the  line  in  image  2  with  points  on  the 
line  in  image  1  that  are  views  of  the  same  surface  point. 
This  is  accomplished  through  use  of  a  phase-locked  loop. 
The  system  has  the  advantage  of  potentially  working  in 
real  time.  It  has  the  potential  disadvantages  that  the  ac¬ 
curacy  may  not  be  very  good  (no  measure  is  given),  oc¬ 
clusion  is  a  problem,  initial  lock-in  may  be  a  problem. 
Castan  and  Shen  [7]  model  the  distortion  from  one  image 
to  another  under  the  assumption  that  the  surface  being 
viewed  is  a  planar  patch.  They  approximate  images  lo¬ 
cally  as  second  degree  polynomials  using  Taylor  series, 
explore  features  that  are  invariant  from  one  image  to  the 
other,  and  use  the  equations  relating  the  Taylor  series  ex¬ 
pansions  at  corresponding  points  in  the  two  images  to  es¬ 
timate  the  planar  surface.  No  mention  seems  to  be  made 
of  how  they  solve  the  correspondence  problem,  and  other 
important  details  are  missing,  so  that  it  is  difficult  to  ap¬ 
preciate  the  advantages,  disadvantages,  and  accuracy  of 
the  method.  Cohen  [29]  deals  with  planar  surface  patches, 
and  uses  Markov  Random  Fields  (MRF)  to  model  pattern 
texture  on  these  surfaces.  Planar  patch  orientation  and  lo¬ 
cation  is  then  estimated  by  comparing  the  parameters  of 
MRF  models  fit  to  pairs  of  patches  one  in  each  image. 
This  elegant  work  represents  a  significant,  potentially  very 
effective  advance  in  3-D  surface  reconstruction,  and  is  in 
harmony  with  our  concept  of  the  joint  estimation  of  3-D 
surface  model  and  surface  pattern  model  in  this  paper  and 
in  [28].  Faugeras,  Ayache,  and  Faveijon  [8]  develop  the 
idea  of  estimating  points  on  a  3-D  object  surface,  lines  on 
a  surface  (and  suggest  planar  surfaces)  from  a  sequence 
of  images.  More  specifically,  they  assume  that  some 
method  has  been  used  to  estimate  points  on  a  surface  based 
on  a  pair  of  images,  and  assume  that  the  probability  dis¬ 
tribution  for  these  estimates  can  be  determined.  They  then 
assume  that  a  sequence  of  such  estimates  and  associated 
distribution  are  known  for  a  sequence  of  images.  Their 
contribution,  then,  is  to  use  the  extended  Kalman  filter 
for  combining  this  sequence  of  estimates  to  obtain  im¬ 
proved  estimates  of  the  surface  points.  They  derive  the 
equations  for  estimating  points  and  lines,  and  suggest  that 
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it  can  be  extended  to  planes.  Among  the  errors  they  take 
into  account,  are  those  in  camera  calibration.  Their  con¬ 
cept  is  important,  though  they  do  not  tackle  here  the  prob¬ 
lem  of  initially  optimally  estimating  the  surface  points  or 
lines  from  a  pair  of  images.  Ohta  and  Kanade  [41  extend 
an  approach  of  Baker  but  use  intervals  between  edges  in¬ 
stead  of  edges  themselves,  and  apply  the  dynamic  pro¬ 
gramming  technique  performing  both  the  intra-  and  the 
inter-scanline  search  simultaneously  for  matching  points 
in  the  two  images.  Waxman  and  Wohn  [10]  take  a  differ¬ 
ent  approach,  focusing  on  the  use  of  optical  flow  to  esti¬ 
mate  the  normals  of  3-D  surfaces  by  solving  some  local 
equations.  To  estimate  position,  they  suggest  using  bi¬ 
nocular  optical  flow.  Being  able  to  extract  information  by 
analytically  solving  equations  is  attractive.  On  the  other 
hand,  3-D  surface  estimation  based  on  optical  flow  is  less 
accurate  than  the  use  of  stereo  because  of  the  small  base¬ 
line.  Last,  we  mention  a  recent  paper  by  Eastman  and 
Waxman  [25].  They  use  two  cameras  having  parallel  op¬ 
tical  axes  and  locally  model  the  disparity  function  as  a 
quadric.  Then,  upon  matching  corresponding  contours  in 
the  two  images,  they  can  solve  for  the  a  priori  unknown 
parameters  in  the  quadratic  disparity  function,  and  from 
this  can  obtain  an  estimate  of  a  local  quadric  approxima¬ 
tion  to  the  3-D  surface  depth  function.  This  has  some  sim¬ 
ilarity  to  our  estimation  algorithm  in  Section  II  (presented 
earlier  [2]).  There  are  significant  differences.  Their  ap¬ 
proach  has  the  desirable  feature  that  once  contour  match¬ 
ing  has  been  accomplished,  estimation  of  the  disparity 
function  locally  is  computationally  simple.  Some  positive 
features  of  our  approach  are  the  following.  We  estimate 
sizable  3-D  surfaces  directly,  and  these  surfaces  can  be 
arbitrary  parameterized  or  stochastic  surfaces.  Camera 
relative  positions  can  be  completely  arbitrary.  The  com¬ 
putational  complexity  of  matching  up  pairs  of  contours— 
one  contour  in  each  image— is  avoided;  instead  we  search 
in  the  surface  parameter  space  (which  could  also  be  costly , 
but  can  be  done  with  simple  parallel  processing).  Last, 
since  our  formulation  is  a  Bayesian  one  we  can  make  op¬ 
timal  use  of  probabilistic  prior  information  concerning  the 
surfaces  to  be  estimated.  Our  paper  is  an  expansion  of  one 
where  our  3-D  surface  estimation  algorithm  was  first  pro¬ 
posed  [1],  [2].  The  preceding  papers  have  been  directed 
at  estimating  points,  curves,  planes,  cylinders,  spheres, 
or  general  parameterized  surfaces  in  3-D  space.  In  general 
for  making  inferences  about  3-D  surfaces  irrespective  of 
the  type  of  sensing  used,  the  idea  of  Besl  and  Jain  [19] 
and  Bolle  and  Sabbah  [20]  for  the  representation  of  sur¬ 
face  patches  by  their  Gaussian  and  mean  curvatures  ap¬ 
pears  to  be  a  very  useful  generalization. 

Sections  II-A-II-F  introduce  our  3-D  surface  estima¬ 
tion  algorithm,  show  examples  of  its  use  with  planar, 
spherical,  and  cylindrical  surfaces,  and  show  that  the  es¬ 
timation  is  maximum  likelihood  estimation.  The  shape  of 
the  likelihood  function,  as  the  parameters  to  be  estimated 
are  varied,  is  explored.  The  algorithm  in  Section  II-D, 
based  on  the  use  of  a  sequence  of  images,  is  a  computa¬ 
tionally  simple  way  around  the  problem  of  having  to  min¬ 


imize  a  multimodal  performance  functional.  Experiments 
are  shown  with  both  partially  artificially  generated  data, 
and  with  real  data  taken  by  a  moving  camera.  Section  II-F 
deals  with  a  fundamental  complication  arising  from  the 
quantization  of  images  into  pixels.  To  this  point,  the  fo¬ 
cus  has  been  on  the  imaging  geometry,  an  algorithm  for 
3-D  surface  estimation,  and  experiments. 

Sections  III  and  III-A  are  devoted  to  the  derivation  of 
the  Cramer-Rao  Lower  Bounds  on  the  error  covariances 
for  the  a  priori  unknown  parameters  to  be  estimated  for 
the  object  models  used.  This  involves  working  through 
some  algebra,  but  is  important  because  in  25  or  more  years 
of  published  literature  on  computer  vision,  there  are  al¬ 
most  no  upper  or  lower  bounds  on  achievable  estimation 
accuracy  tied  to  the  raw  image  data.  The  only  published 
results  we  are  aware  of  are  [21]  and  [22],  [23]  .  The  pres¬ 
ent  problem  is  sufficiently  general  that  derivation  of  the 
C.R.  Bounds  here  provides  insight  into  how  they  may  be 
applied  elsewhere.  For  those  not  interested  in  the  deri¬ 
vation  of  the  bounds,  the  final  expressions  are  given  by 
Eqs.  (37)  and  (41),  and  pertinent  discussion  is  given  in 
Section  III-A.  Section  III-B  and  III-C  provide  a  physical 
interpretation  of  the  Bound,  showing  the  explicit  depen¬ 
dence  on  camera  geometry,  object  surface  pattern,  and 
object  surface  parameterization.  And  Section  III-D  con¬ 
tains  an  example  of  a  numerical  computation  of  the 
Bound. 

A.  Notation  and  Description  of  Camera  Motion 

Let  P  be  a  point  in  3-D  space  and  r,T  =  (x',  y\  z' )  be 
its  representation  in  the  fixed  orthogonal  world  reference 
frame.1  Since  we  assume  that  objects  do  not  move,  this 
reference  frame  is  fixed  with  respect  to  the  objects  viewed 
by  the  camera,  and  we  will  call  it  the  object  reference 
frame  (ORF).  Let  r7  =  (. x,y,z )  be  the  representation,  of 
the  point  P ,  in  the  orthogonal  reference  frame  attached  to 
the  camera.  This  reference  frame,  the  camera  reference 
frame  (CRF),  is  such  that:  1)  The  camera  optical  axis  is 
parallel  to  the  z  axis,  and  it  looks  at  the  negative  z  axis. 

2)  The  x  and  y  axes  are  parallel  to  the  sides  of  the  image. 

3)  The  origin  of  the  camera  reference  frame  coincides  with 
the  center  of  the  image  plane.  The  image  is  corrected  so 
that  the  view  is  not  inverted  top  to  bottom  and  left  to  right, 
i.e.,  a  central  projection  is  used. 

Let  B  denote  the  3  x  3  orthogonal  rotation  matrix  that 
specifies  the  three  unit  coordinate  vectors  for  the  CRF  in 
terms  of  the  three  unit  coordinate  vectors  for  the  ORF. 
Let  /*'  specify  the  origin  of  the  CRF  in  the  ORF.  Then 

r  —  BT(rf  -  r'),  and  r*  =  Br  +  r'c.  (1) 

Note  that  the  rotation  matrix  B  and  the  translation  vector 
r'c  are  assumed  to  be  known. 

‘A  symbol  in  boldface  is  a  column  vector,  a  superscript  capital  T  at¬ 
tached  to  a  vector  denotes  vector  transpose. 
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B.  Images  of  an  Object  Surface  Point  in  Two  Image 
Frames 

Fig.  1  illustrates  the  orthographic  imaging  model  [15] 
used,  i.e.,  all  rays  from  points  on  the  object  surface  to 
the  image  plane  are  roughly  parallel.  This  model,  an  ap¬ 
proximation  to  the  pinhole  model  (perspective  projec¬ 
tion),  is  explained  in  Appendix  A.  It  is  applicable  when 
the  object  diameter  is  small  compared  with  object  to  cam¬ 
era  distance.  With  slight  modification,  all  of  our  algo¬ 
rithms  can  be  run  using  the  pinhole  model,  and  the  ex¬ 
periment  using  real  images  taken  by  a  moving  camera, 
shown  at  the  end  of  Section  II-D,  used  the  pinhole  model 
in  the  surface  estimation.  Let  P  denote  a  point  on  a  pa¬ 
rameterized  3-D  surface  of  interest.  Point  P  on  the  object 
surface  is  seen  as  points  having  coordinates  s  and  u  in 
images  1  and  2,  respectively.  We  assume  a  Lambertian 
reflectance  model.  Then  the  images  of  point  P  at  s  and  u 
will  have  the  same  intensity.  The  techniques  proposed  will 
not  apply  without  modification  to  specular  reflectors  be¬ 
cause  the  location  of  points  on  the  object  surface  at  which 
specular  reflection  occurs  depends  on  the  camera  loca¬ 
tion.  Since  most  surfaces  of  interest  are  largely  Lamber¬ 
tian,  the  assumption  is  a  useful  one.  Hence, 

h(u)  =  IM  (2) 

where  Ix(u)  and  I2(s)  are  the  picture  functions  (image  in¬ 
tensity  functions)  in  Frames  1  and  2,  respectively.  For 
those  cases  where  the  Lambertian  assumption  does  not 
apply,  a  possible  modified  approach  is  to  use  an  edge  map. 
Here,  pixels  are  given  values  of  128  or  0  depending  on 
whether  they  are  detected  as  being  edge  points  or  non¬ 
edge  points,  respectively.  These  maps  are  then  smoothed 
to  obtain  more  continuous  arrays,  and  these  are  used  as 
though  they  are  regular  picture  functions  in  our  estimation 
algorithms.  The  usefulness  of  the  edge  map  is  that  it  is  a 
representation  of  rapid  changes  in  the  object  surface  pat¬ 
terns,  and  largely  unaffected  by  the  presence  of  some 
specular  component  in  the  object  surface.  See  [24]  for 
other  approaches  to  partially  specular  surfaces. 

Our  approach  is  applicable  to  any  parameterized  sur¬ 
face.  However,  in  this  paper  we  illustrate  the  approach 
through  the  use  of  planar,  cylindrical,  spherical,  and  gen¬ 
eral  quadric  primitive  surfaces.  The  equation  of  the  gen¬ 
eral  quadric  surface  is  given  by  (3).  The  other  three  prim¬ 
itive  quadrics  are  obtained  by  imposing  suitable 
constraints  among  the  coefficients  in  (3). 

anx2  +  2anxy  +  2  anxz  +  a22y 2  +  2  a2iyz 

+  a33z2  +  2al4x  +  2a243’  +  2a34z  +  £44  =  0.  (3) 

These  surfaces  are  described  in  the  ORF  which  we  take 
to  be  CRF1,  i.e.,  the  first  camera  reference  frame,  and 
are  uniquely  determined  by  specifying  the  values  of  a  pa¬ 
rameter  vector  a.  Denote  aT  =  (aUy  al2,  *  *  *  ,  a44).  For 
a  general  surface,  a  will  have  K  components.  Thus,  the 
image  at  point  s  T  —  (jc(  1 ),  y  ( 1 ) )  in  the  first  image  frame 
is  the  image  of  a  point  having  coordinates  x(l),  y(l), 
z  ( 1 )  on  the  object  surface  where  z  ( 1 )  is  the  solution  of 


Fig.  1.  Images  of  an  object  surface  point  in  two  image  frames. 

(3)  when  x(l)  and  y  (1)  are  specified.  We  see  that  the  pic¬ 
ture  function  at  point  s  in  CRF1  is  the  picture  function  at 
point  u  in  CRF2  where 

(uT,  z(2))r  =  Br(r(l)  -  rc(2))  (4) 

and  uT  —  (x(2),y(2)).  Parameters  specifying  this  trans¬ 
formation  are  three  rotation  angles  specifying  the  rotation 
matrix  B ,  and  three  translation  components  specifying  the 
location  rc(  2)  of  the  origin  of  CRF2  in  CRF1.  Denote 
this  six  component  parameter  vector  by  b .  Denote  the 
functional  relationship  between  s  and  u  by  (5) . 

u  =  h(s ,  b ,  z{s,  a)).  (5) 

Note  u  depends  explicitly  on  s  and  z(  1 ),  and  z(  1 )  is  de¬ 
termined  by  both  s  and  a  as  shown  in  (3). 

C.  On  the  Form  of  u  =  h(s}  b ,  z(s,  a)) 

Assume  we  have  a  rigid  surface  in  3-D  space.  The 
equation  of  this  surface  with  respect  to  the  object  refer¬ 
ence  frame  is 


*(r(i))  =  o 

(6) 

and  for  CRF2,  upon  using  (4), 

g(B(2)r(2)  +  rc(2))  =0. 

(7) 

Denote 

C  =  Bt{  2) 

(8) 

and 

d  =  —Bt(2)  rc(2). 

(9) 

Then 

r(  2)  = 

0(1)  +  d 

(10) 

We  partition  the  C  matrix  as: 

c  = 

r  ^  -1 
Ln  c  12 

(ID 

L^21  ^22J 


where  Cu  is  2  x  2,  cx2  and  c2{  are  2  x  1  and  c22  is  a 
number.  Partition  d  as  dT  =  ( eTd3 )  where  e  is  2  x  1  and 
d3  is  a  number. 
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Then  from  the  preceding 

«( 2)  =  C„«(  1)  +  cnz{  1)  +  €.  (12) 

Equation  (12)  is  the  parametrized  equation  of  a  line. 
That  is,  a  point  with  coordinates  uT(  1 )  =  (jc(  1 ),  y ( 1 )) 
in  the  first  frame  is  the  image  of  some  point  in  the  surface. 
Equation  (12)  gives  the  locus  of  possible  locations  (pa¬ 
rametrized  by  the  unknown  z  ( 1 ) )  that  the  point  may  have 
in  frame  2.  This  is  the  epipolar  line  in  image  plane  2  as¬ 
sociated  with  point  u  ( 1 )  in  image  plane  1 . 

II.  Estimation  of  Surface  Parameters  a  Using 
Two  Images 

If  b  is  known  and  a  is  aT,  the  true  a ,  then 

IM  =  I2(h(s,  b,  z(s,  aT)))  (13) 

for  each  s.  Choose  a  square  M  x  M  pixel  window  in 
CRF1.  Denote  this  pixel  set  by  D.  Consider  the  error 
measure 

eD(a)  =  M~2  J]  [/,(*)  -  I2(h(s,  b,  z(s,  «)))f.  ( 14) 

Then  eD(a)  is  a  minimum  at  a  =  aT.  Our  problem  is  to 
estimate  aT  by  minimizing  (14)  with  respect  to  a.  An 
interpretation  of  (13)  is  that  the  image  I2(u)  can  be  trans¬ 
formed  into  the  image  I\(s)  by  a  varying  scale  change  that 
locally  consists  of  a  rotation,  a  nonisotropic  stretching, 
and  a  translation. 

The  minimization  technique  we  have  used  is  gradient 
descent.  The  gradient  of  (14)  is 

^  =  "2M"2J5.[/i(s)“/2(“)i^}  (15) 

where  from  (12) 

u  =  h(s ,  b ,  z(s,  a))  =  Cn5  +  c]2z(s,  a )  +  e.  (16) 

Equation  (15)  is  an  explicit  function  of  s ,  b ,  z,  and  is 
implicitly  dependent  on  a  because  of  the  determination  of 
z  by  s  and  a.  Use  of  the  chain  rule  gives2 

dl2(u)  _  dl2(u)  dh(s ,  b ,  z(s,  a)) 

da  du  da 

_  9/2(m)  dhjs,  b,  zjs,  a))  dz(s,  a)  ,  . 

du  dz  da 

with  dh  ( s ,  b ,  z(s ,  a)) / dz  =  c12.  In  general,  it  may  be 
inconvenient  to  express  z  as  an  explicit  function  of  a.  In 
such  cases  to  proceed  further  denote  the  left  side  of  (6) 
by  g(*,  y,  z,  a).  Then  (6),  the  equation  of  the  object  sur¬ 
face,  is  g(x,  y,  z,  a)  =0.  Hence,  we  can  write 

3z(s,  a)  _  dg{x,  y,  z,  a)  Idgjx,  y,  z,  a) 

da  da  /  dz 

2The  notation  used  here  is  that  5/2(m) /da  is  a  K  component  row  vector, 
and  dh{s,  b,  z{s,  a)) /da  is  a  2  X  K  matrix. 


Putting  (16)— (18)  together  results  in 

^  =  2M-2S  [im-i2(u)] 
da  seD  1 


dl2(u)  dh(s ,  b,  z) 
du  dz 


dgjx,  y,  z,  a)  Idgjx,  y,  z,  a) 
da  I  dz 


(19) 


Hence,  computation  of  (19)  involves  processing  the  data 
to  obtain  /2(«),  and  dl2(u)/du ,  and  computing  c12 
and  the  last  two  partial  derivatives  from  the  known  cam¬ 
era  motion  and  knowledge  of  g(x,  y,  z,  a). 

A  steepest  descent  algorithm  for  minimizing  (14)  is 


&n+  1 


deD(a„) 

da 


(20) 


We  use  a  An  that  depends  on  eD(an )  and  deD(an) / da  and 
has  magnitude  that  goes  to  0  as  n  goes  to  infinity. 

A.  Algorithm  Operation-Interpretation ,  and 
Experiments  with  a  Sphere 

To  illustrate  the  approach,  consider  a  spherical  surface 
described  by  the  equation 

(x  -  X0)2  +  (y  -  y0)2  +  (z  -  z0)2  =  R1  (21) 


For  this  surface,  z  can  be  solved  for  explicitly,  via 

2  =  Zo  ±  {R2  -  (x  -  X0f  -  (y  -  y0 f)U~.  (22) 

The  positive  square  root  is  used  since  the  outside  surface 
of  the  sphere  is  seen  by  the  camera  looking  in  the  negative 
z  direction.  Hence, 

/ dz(s ,  a)\  _  (  dz  dz  dz  dz\ 

\  da  )  \dxo  dyo  dzo  dR/ 

=  ((x  -  x0)/(z  -  Zo),  {y  -  yo)/{z  -  Zo), 

1,  R/{z  —  z0))  (23) 

and  z  -  zo  =  (R2  ~  (x  -  x0)2  -  (y  -  >'o)2)1/2-  The 
vector  dz/da  can  be  computed  directly  from  this. 

The  analogous  equations  for  planes,  cylinders  and  gen¬ 
eral  quadrics  are  presented  in  Appendixes  C,  D,  and  E, 
respectively. 

Fig.  2  is  useful  for  illustrating,  in  two  dimensions,  the 
operation  of  our  algorithm  for  estimating  aT.  Spheres  in 
3-D  are  shown  as  circles.  Consider  the  processing  of  the 
image  patch  between  points  s'  and  s"  in  Frame  1.  This 
patch  is  the  image  of  the  patch  between  points  pr  and  p" 
on  the  true  sphere  labeled  aT .  The  same  patch  on  the 
sphere  surface  gives  rise  to  the  image  patch  between  points 
u'  and  u"  in  Frame  2.  Now  suppose  the  system’s  esti¬ 
mation  of  aT  is  a.  The  associated  sphere  is  shown.  The 
performance  functional  for  the  estimate  of  a  is  given  by 
(14)  and  is  computed  as  follows.  The  system  thinks  that 
the  locations  on  the  sphere  surface  that  give  rise  to  the 
images  at  points  sf  and  s”  in  Frame  1  are  the  intersections 
of  the  dashed  lines,  from  s'  and  s" ,  with  the  sphere  la¬ 
beled  a.  These  sphere  surface  points  would  be  seen  as  the 
images  at  point  ti'  and  u”  in  Frame  2.  Hence,  the  system 
takes  the  image  patch  between  points  u '  and  u”  in  Frame 
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Fig.  2.  Two-dimensional  illustration  of  the  geometry  for  the  surface  esti¬ 
mation  algorithm. 

2  and  assumes  that  the  image  at  each  point  u  in  this  in¬ 
terval  is  the  same  image  as  the  image  at  a  point  s  in  the 
interval  between  sf  and  s”  in  Frame  1.  The  points  u  and 
s  are  related  geometrically  as  in  the  figure,  or  algebrai¬ 
cally  by  (4).  Performance  functional  (14)  requires  com¬ 
puting  the  error  Ix(s)  -  I2(h(s ,  b ,  z(s,  a))).  Of  course, 
in  the  absence  of  image  noise,  (14)  is  zero  only  when  a 
=  aT. 

We  make  the  following  interesting  observations.  From 
the  geometry  of  image  formation  in  Fig.  2,  the  varying 
scale  change  that  maps  the  image  patch  over  interval  [ s' , 
s"  ]  in  Frame  1  into  the  image  patch  over  interval  [«',  u"  ] 
is  seen.  Note  that  both  a  scale  change  and  a  translation 
are  involved  in  this  2-D  illustration. 

If  the  incorrect  a  is  used  in  computing  the  performance 
functional  (14),  the  patch  of  image  used  in  Frame  2  is  that 
over  the  interval  [w\  u"  ].  Note  that  this  interval  is  both 
a  shift  and  a  varying  scaling  of  the  interval  [u\  u"].  If 
instead  of  a  sphere,  we  were  dealing  with  a  planar  sur¬ 
face,  the  scale  change  would  be  constant  throughout  the 
image. 

The  interpretation  is  further  illustrated  by  the  following 
3-D  computer  simulations.  Fig.  3(a)  and  (b)  are  two 
frames  illustrating  images  of  the  same  sphere  but  taken  at 
different  locations  and  orientations.  The  angle  between  the 
optical  axes  of  the  two  cameras  is  45  ° .  The  data  was  gen¬ 
erated  by  taking  a  real  image  with  a  T.V.  camera  and 
projecting  the  image  pattern  onto  a  sphere,  from  a  few 
different  directions.  The  pattern  projected  onto  the  sphere 
in  this  way  is  the  pattern  that  is  then  viewed  by  the  two 
cameras  to  create  Frames  1  and  2.  Note  that  all  these  pro¬ 
jections  are  done  by  computer  simulation.  The  image 
patch  used  in  Frame  1  is  the  interior  of  the  square  shown 
there.  Point  P  is  its  center  point.  Points  P  in  Frames  1  and 
2  are  locations  of  images  of  the  same  point  on  the  3-D 
sphere  surface.  Note  how  the  image  in  the  vicinity  of  Point 


(a)  (b) 

Fig.  3.  Two  computer  generated  images  of  a  sphere  taken  at  different  lo¬ 
cations  and  orientations.  (Section  II-A.) 


Fig.  4.  Image  boxes  are  numbered  1  through  6,  running  left  to  right,  top 
to  bottom.  Box  1  is  taken  by  camera  1.  Box  k  (k  =  2,  3,  •  •  •  ,  6)  is 
associated  with  the  estimate  of  a  in  line  k  of  Table  I. 


TABLE  I 


Image  Box 

_ ? _ _ _ 1 

xn 

zn 

Zn 

R 

2 

40.00 

40.00 

-2040.00 

128 

3 

20.83 

40.37 

-2016.74 

128 

4 

20.79 

40.40 

-2004.72 

128 

5 

10.52 

30.30 

-2009.08 

128 

6 

0.41 

1.13 

-1999.89 

128 

-_aT  . 

0.00 

0.00 

-2000.00 

128 

a 

0.41 

1.13 

-1999.89 

128 

P  in  Frame  2  is  a  varying  scaled  transformation  of  the 
image  in  the  vicinity  of  Point  P  in  Frame  1.  In  Fig.  4,  we 
refer  to  the  six  image  boxes  as  1  through  6  starting  with 
the  upper  left  and  moving  left  to  right  and  top  to  bottom. 
Box  1  is  the  image  shown  in  the  square  window  in  Fig. 
3(a).  The  system  begins  with  a  guess  as  to  a.  This  initial 
guess  for  a  is  shown  as  that  associated  with  Box  2  in  Table 
I.  Using  this  a,  for  each  point  s  in  the  window  in  Fig.  3(a) 
the  system  takes  the  image  at  point  h(s ,  b ,  z(s ,  a))  in 
Fig.  3(b)  and  puts  this  picture  function  value  at  location 
s  in  an  array.  The  image  so  formed  is  that  in  Box  2.  It  is 
the  difference  of  the  picture  functions  of  these  two  images 
that  enters  as  Ix(s)  -  I2(h(sy  b ,  z(s ,  a)))  in  (14).  (If  this 
a  were  equal  to  aT ,  then  in  the  absence  of  image  noise, 
the  image  in  Box  2  would  be  identical  to  that  in  Box  1 . ) 
Box  3  is  the  image  formed  in  this  way  using  a{ ,  the  pa¬ 
rameter  vector  following  the  first  iteration  of  our  param- 
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eter  estimation  algorithm.  In  Boxes  4  and  5  are  the  trans¬ 
formed  images  associated  with  a  values  at  intermediate 
stages  of  the  estimation  process.  Finally,  Box  6  is  the  im¬ 
age  associated  with  the  a  value  found  at  the  last  estima¬ 
tion  stage.  These  a  values  for  the  stages  associated  with 
the  six  boxes  are  shown  in  Table  I.  If  the  final  estimate 
a  is  equal  to  aT ,  then  in  the  absence  of  image  noise,  the 
images  in  boxes  1  and  6  would  be  identical. 

Figs.  5  and  6  show  the  shape  of  the  error  function  eD(a) 
for  the  experiment  described.  Fig.  5  shows  the  graph  of 
the  error  function  produced  by  holding  Zo  and  R  fixed  at 
the  true  values  and  varying  jc0  and  y0  over  the  ranges  — 150 
<  x0  <  200  and  —170  <  y0  <  170  where  aT  is  given  in 
Table  I.  Fig.  6  shows  the  same  error  function  with  x0  and 
R  held  fixed  and  y0  and  Zq  varied  over  the  ranges  -90  < 
y0  <  120  and  —2120  <  Zq  <  —1880.  Note  that  the  error 
function  changes  much  more  rapidly  with  change  in  Zo 
than  with  change  in  x0  or  y0  where  z0  is  the  distance  from 
the  sphere  center  to  the  camera. 

B.  Experiments  with  a  Cylinder  when  Using  Two 
Images 

There  are  a  number  of  possible  parameterization  for  the 
cylinder.  The  following  one  has  been  found  to  be  desir¬ 
able  for  use  in  the  algorithm  given  in  (20).  Consider  Fig. 
7.  Then  the  cylinder  axis  location  and  orientation  are 
specified  by  the  parameter  vector  (x0,  z0>  </>)•  We  ar¬ 

bitrarily  choose  y0  to  be  any  point  in  the  vicinity  of  the 
image  patch  being  processed  in  CRF1 .  Then  x0  and  z0  are 
the  remaining  coordinates  to  be  determined  for  specifying 
a  point  on  the  cylinder  axis.  As  can  be  seen  in  Fig.  7,  d 
and  <t>  are  the  angles  specifying  the  cylinder  axis  orienta¬ 
tion,  with  0  being  the  angle  between  the  z  axis  and  the 
projection  of  the  cylinder  axis  into  the  xz  plane,  and  <j> 
being  the  angle  between  the  y  axis  and  the  cylinder  axis. 

Fig.  8(a)  and  (b)  are  two  frames  illustrating  images  of 
the  same  cylinder  but  taken  at  different  camera  locations 
and  orientations.  The  angle  between  the  optical  axes  of 
the  two  cameras  is  45  ° .  The  data  were  generated  and  the 
images  were  formed  in  the  same  way  as  described  in  Sec¬ 
tion  II- A  for  the  sphere.  The  image  patch  used  in  image 
1  is  the  interior  of  the  square  shown  there  in  Fig.  8(a). 
The  portion  of  the  cylinder  surface  seen  in  this  square 
patch,  is  seen  as  the  patch  in  the  four  sided  polygon  in 
image  2  shown  in  Fig.  8(b).  (Observe  that  there  is  a 
dashed  line  along  the  left  border  of  the  cylinder  image  in 
Fig.  8(b).  This  is  due  to  spatial  quantization  and  the  fact 
that  the  border  occurs  at  the  image  of  a  portion  of  the 
surface  pattern  where  there  is  a  discontinuity  from  a  white 
region  to  a  dark  region  in  the  pattern  intensity.)  In  Fig. 
9,  we  refer  to  the  six  image  boxes  as  1-6  starting  with  the 
upper  left  and  moving  left  to  right  and  top  to  bottom.  The 
pictures  shown  are  similar  to  those  in  Section  II-A  for  a 
sphere.  The  system  begins  with  a  guess  for  a.  This  initial 
guess  is  given  in  line  1  in  Table  II.  As  in  Section  II-A, 
using  this  a,  for  each  s  e  D  the  system  takes  I2(h(s ,  b , 
a))  and  puts  it  at  location  s  in  an  array,  thus  generating 
the  image  shown  in  Box  1 .  Boxes  2-5  are  images  formed 


Fig.  5.  Error  function  produced  by  holding  z0  and  R  fixed  at  the  true  values 
and  varying  jc0  and  y0  over  the  ranges  —  1 50  <  -c0  <  200  and  — 170  < 
Yo  <  170. 


Fig.  6.  Error  function  with  x0  and  R  held  fixed  and  y0  and  20  varied  over 
the  ranges  —90  <  y0  <  120  and  —2120  <  z0  <  1880. 


Fig.  7.  Parameterization  of  cylindrical  surfaces  used  with  the  experiments 
in  Section  II-B . 


1" 

IP  * 

5s: 

f  ’  * 

S. 

0 

(a)  (b) 

Fig.  8.  Two  computer  generated  images  of  a  cylinder  taken  at  different 
locations  and  orientations.  (Section  Il-B.) 


in  this  way  for  a  subset  of  the  sequence  of  values  for  a 
occurring  in  the  iterative  estimation  of  aT.  The  image  con¬ 
structed  in  this  way  for  the  final  estimate  of  aT  is  shown 
in  Box  5.  Box  6  is  the  image  constructed  given  the  true 
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Fig.  9.  Image  boxes  are  numbered  1  through  6,  running  left  to  right,  top 
to  bottom.  Box  k  (k  =  1,2,  •  •  •  ,6)  is  associated  with  the  estimate  of 
a  in  line  k  of  Table  I. 


TABLE  II 


Image  Box 

a  _  1 

xn 

_?Q 

8 

9 

1 

-40.0 

-1960.0 

30.0 

50.0 

2 

-38.9 

-1968.7 

36.0 

38.2 

3 

-24.8 

-1994.5 

43.2 

23.5 

4 

-11.6 

-2000.2 

47.0 

27.1 

5 

-0.7 

-2000.2 

50.9 

31.7 

6 

0.0 

-2000.0 

53.7 

31.6 

surface  parameters.  This  is  the  same  patch  as  marked  in 
Fig.  8(a).  Note  that  the  images  in  Boxes  5  and  6  are  al¬ 
most  identical,  as  they  should  be.  The  parameter  esti¬ 
mates  are  shown  in  Table  II.  The  final  estimates  are  close 
to  the  true  values. 

The  error  as  a  function  of  x0,  z 0  in  the  vicinity  of  aT  is 
shown  in  Fig.  10.  The  global  minimum  is  in  the  first  val¬ 
ley  in  the  figure.  The  somewhat  multimodal  error  function 
plotted  is  400  by  400  units,  for  a  cylinder  radius  of  128 
units.  An  enlargement  of  this  function  in  the  vicinity  of 
a7is  shown  in  Fig.  10(b).  The  plot  shown  is  160  by  160 
units.  Notice  that  the  function  changes  rapidly  in  the  z0 
direction,  but  much  more  slowly  in  the  jc0  direction.  Fig. 
11(a)  is  a  plot  of  the  error  as  a  function  of  6  and  <j>.  The 
extent  of  the  plot  is  400  degrees  in  each  direction.  Fig. 
1 1(b)  is  a  40  by  40  degree  plot  of  a  portion  about  aT .  Note 
that  there  is  slow  variation  as  a  function  of  <t>.  The  reason 
is  that  for  the  geometry  of  this  example,  the  object  surface 
patch  involved  is  roughly  parallel  to  the  plane  that  the 
cylinder  axis  moves  in  when  4>  is  varied  but  6  is  held  fixed. 
Hence,  the  geometry  is  much  like  that  of  a  planar  surface 
being  moved  in  a  plane  parallel  to  the  surface.  The  error 
function  that  we  are  using  is  insensitive  to  such  motion, 
i.e.,  to  such  parameter  variation.  The  error  function  does 
change  rapidly  for  parameter  variation  in  other  directions, 
e.g.,  the  direction  of  0  for  the  specific  aT  involved  here. 

C.  Experiments  with  a  Cylinder  when  Using  Two  Edge 
Maps 

The  preceding  theory  is  predicted  on  the  assumption  of 
a  Lambertian  surface,  i.e.,  that  a  point  on  the  object  sur¬ 
face  appear  with  the  same  image  intensity  no  matter  what 
the  location  and  orientation  of  the  camera  be.  Now  most 


surfaces  have  some  specular  (mirror)  component.  If  the 
Lambertian  assumption  does  not  apply,  then  an  edge  de¬ 
tector  can  be  run  over  the  image,  and  pixels  located  at 
large  discontinuities  in  the  picture  function  are  given  a 
fixed  large  value.  All  other  pixels  are  given  value  0.  Then 
this  resulting  array  is  low-pass  filtered  to  produce  what 
we  are  calling  the  edge  map.  This  edge  map  should  be  a 
function  only  of  the  pattern  on  the  object  surface,  and  not 
of  the  illumination  nor  of  the  reflection  properties  of  the 
surface,  except  for  the  rare  situation  of  a  very  highly  spec¬ 
ular  surface  reflecting  an  illumination  intensity  pattern 
having  sharp  large  spatial  discontinuities.  We  treat  this 
edge  map  as  a  regular  image  and  use  it  in  our  algorithm. 
Fig.  12(a)  and  (b)  are  obtained  by  edge  detection  of  Fig. 
8(a)  and  (b),  respectively.  They  are  then  low  pass  filtered 
to  obtain  the  edge  maps  used  in  the  surface  reconstruction 
algorithm.  Table  III  lists  the  estimates  of  aT  at  a  number 
of  iterations  in  the  algorithm,  and  Fig.  13  is  a  sequence 
of  reconstructions  analogous  to  those  in  Fig.  9.  The  min¬ 
imum  of  (14)  as  a  function  of  a  is  larger  when  using  the 
edge  maps  rather  than  the  images  in  Fig.  8(a)  and  (b). 
The  reason  is  that  the  edges  associated  with  a  point  on  the 
object  surface  have  different  widths  in  Fig.  8(a)  and  (b). 
Hence,  even  in  the  absence  of  sensor  noise,  this  perfor¬ 
mance  functional  will  not  be  zero  when  a  =  aT.  Never¬ 
theless,  the  estimation  of  aT  based  on  the  edge  map  data 
does  work  reasonably  well. 
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(b) 

Fig.  11.  (a)  Plot  of  the  error  function  as  a  function  of  d  and  <f>  centered  at 
aT.  (b)  An  enlargement  of  the  above  error  function  in  the  vicinity  of  aT. 


(a)  (b) 

Fig.  12.  Edge  maps  obtained  by  simple  edge  detection  of  Fig.  8(a)  and 
(b).  (Section  II-C.) 

D.  Experiments  with  a  Cube  Having  Planar  Surfaces 

Two  things  of  interest  are  illustrated  here.  First,  image 
1,  the  image  of  a  cube,  is  partitioned  into  small  windows, 
and  the  behavior  of  the  estimates  of  the  surface  patches 
seen  within  these  windows  is  studied.  Some  windows  see 
portions  of  only  one  planar  surface,  and  some  see  portions 
of  two,  or  even  three,  planar  surfaces.  Second,  the  error 
function  (14)  is  unimodal  but  shallow  over  a  large  region 
in  the  3-D  parameter  space  when  the  angle  between  the 
camera  optical  axes  is  small,  e.g.,  1°.  And  the  function 


TABLE  III 


Fig.  13.  Image  boxes  are  numbered  1  through  8,  running  left  to  right,  top 
to  bottom.  Box  k  (k  =  1,  2,  •  ■  ■  ,  7)  is  associated  with  the  estimate  of 
a  in  line  k  of  Table  III.  Box  8  is  the  marked  window  shown  in  Fig.  12(a). 


u  u' 

Fig.  14.  Two  pixels  in  image  plane  2  may  correspond  to  one  pixel  in  im¬ 
age  plane  1,  as  shown  here. 

is  multimodal  over  this  region  but  with  a  narrow  valley 
about  the  true  parameter  value  when  the  angle  between 
the  camera  optical  axes  is  large.  This  behavior  is  ex¬ 
ploited  to  arrive  at  a  computationally  simple  search  al¬ 
gorithm  for  3-D  surface  parameter  estimation  by  using  a 
sequence  of  images  (four  in  this  example),  where  the  in¬ 
creasing  angles  between  the  optical  axis  of  the  first  cam¬ 
era  and  those  of  successive  ones  take  the  values  1°,  3°, 
10°. 

Fig.  15(a)  shows  image  1,  a  computer  generated  image 
of  a  comer  of  a  cube,  partitioned  into  many  small  win¬ 
dows.  A  plane  is  specified  by  the  normal  vector  n,  and 
the  distance  d  in  the  direction  of  the  z  axis,  from  the  cen¬ 
ter  of  the  image  window  to  the  planar  patch.  We  specify 
the  normal  vector  by  the  two  angles,  f  and  p,  as  shown 
in  Fig.  16.  Then,  the  equation  for  the  plane  with  respect 
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Fig.  15.  (a)  shows  image  1 ,  a  computer  generated  image  of  a  comer  of  a  cube,  partitioned  into  many  small  windows,  (b)  shows 
image  2  marked  with  regions  corresponding  to  the  windows  in  image  1,  using  the  estimates  based  on  image  1  and  image  2. 
(c)  shows  image  3  marked  with  regions  corresponding  to  the  windows  in  image  1,  using  the  estimates  based  on  image  1  and 
image  3.  (d)  shows  image  4  marked  with  region  corresponding  to  the  windows  in  image  1,  using  the  estimates  based  on  image 
1  and  image  4.  (e)  shows  the  values  of  the  estimated  parameters  based  on  image  1  and  image  4. 
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to  CRF1,  which  is  the  ORF,  is 

z  =  (-tan  i)  {x  -  x0)  +  (y  -  ?o)  +  (~d) 

or 

z  =  ( -tan  t)x  +  y  +  ( -d' ) 

\cos  v y ! 

where  x0  and  y0  are  the  2-D  coordinates  of  the  center  of 
the  chosen  window  and  d  is  a  function  of  xQ  and  y0 .  The 
distance  d'  here  is  the  distance  along  the  z  axis  from  the 
origin  to  the  plane.  In  order  to  compare  the  parameter  val¬ 
ues  of  different  surface  patches,  let  the  vector  a  denote 
the  three  parameters,  \p,  p,  and  d’ .  Since  the  image  of  the 
cube  is  generated  by  the  computer,  we  know,  for  exam¬ 
ple,  that  the  true  parameter  aT  for  the  top  plane  of  the  cube 
is  (xjy  =  0°,  p  =  -45°,  d'  =  1638). 

As  mentioned,  the  error  function  to  be  minimized,  (14), 
is  smooth  and  unimodal  over  a  large  region  in  object  pa¬ 
rameter  space  when  the  angle  between  opitcal  axes  and 
the  difference  in  locations  of  the  two  cameras  are  small. 
This  occurs  because  for  a  small  baseline,  the  stereo  dis¬ 
parity  changes  slowly  with  change  in  3-D  surface  param¬ 
eters.  However,  when  the  baseline  is  large,  (14)  becomes 
multimodal  with  a  narrow  valley  about  aT.  Since  we  have 
a  sequence  of  images  available,  we  can  begin  with  a  small 
baseline  image  pair.  We  can  start  the  gradient  descent  vir- 
tuely  anywhere  in  the  parameter  space.  Because  of  the 
unimodality  of  (14),  the  algorithm  quickly  converges  to 
the  minimum  of  the  valley.  The  estimate  of  aT  found  by 
the  algorithm  will  not  be  accurate,  because  the  valley  is 
broad.  But  this  estimate  will  lie  within  the  valley  contain¬ 
ing  aT  of  (14)  for  a  longer  baseline  geometry,  and  can 
therefore  be  used  as  the  starting  value  for  a  new  estima¬ 
tion  of  aT  based  on  image  1  and  image  3\  The  process  is 
repeated  again,  etc.  Four  images,  Fig.  15(a)-(d),  with  the 
second,  third,  and  fourth  optical  axes  making  angles  1°, 
3°,  10°  with  the  first  optical  axis,  are  used  in  the  exper¬ 
iment.  The  initial  parameter  value  used  in  the  estimation 
algorithm  is  (\p  =  0.0,  p  =  0.0,  d  =  2000)  for  all  the 
windows  in  the  image  containing  the  three  sides  of  the 
cube.  That  is,  we  start  with  an  initially  guessed  plane  that 
is  perpendicular  to  the  optical  axis  of  the  first  camera  po¬ 
sition,  and  allow  it  to  tilt,  rotate,  and  shift  to  the  true 
position  by  searching  for  the  best  match  in  the  parameter 
space.  As  mentioned  before,  d  is  the  distance  from  the 
center  of  the  image  window  to  the  planar  patch  in  the  di¬ 
rection  of  the  z  axis.  The  first  image  is  partitioned  into 


64  square  windows,  each  with  64  x  64  pixels,  shown  in 
Fig.  15(a).  The  algorithm  is  run  on  the  first  two  images, 
independently  for  each  window  in  Fig.  15(a).  Based  on 
the  convergent  surface  parameters,  the  regions  in  image 
2  corresponding  to  the  windows  in  image  1  are  drawn  in 
Fig.  15(b)  for  illustration.  Windows  numbered  in  white 
are  those  for  which  the  estimated  plane  is  consistent  with 
the  two  images,  i.e.,  (14)  is  small,  whereas  windows 
numbered  in  black  are  those  for  which  such  consistency 
is  absent,  i.e.,  (14)  is  large.  A  threshold  was  arbitrarily 
set  at  E  =  60  for  this  determination.  Here,  most  of  the 
convergent  parameter  estimates  are  still  quite  far  from  the 
true  surface  parameter  values,  but  are  much  closer  to  the 
true  ones  than  are  the  initial  guesses.  Using  these  conver¬ 
gent  estimates  as  the  initial  values,  the  algorithm  is  run 
on  image  1  and  image  3,  and  then  image  1  and  image  4. 
The  convergent  parameters  and  their  corresponding  error 
measure  are  displayed  in  Fig.  15(e).  The  corresponding 
regions  in  image  4  are  shown  in  Fig.  15(d).  The  results 
are  surprisingly  good  considering  that  the  angle  shift  be¬ 
tween  the  first  and  the  last  image  is  only  10°,  and  the  data 
used  for  each  local  estimate  are  only  a  small  region  in 
each  image.  The  ultimate  accuracy  is  obtained  by  increas¬ 
ing  the  baseline,  and  by  optimally  using  the  data  in  all 
images  simultaneously  [27], 

The  algorithm  in  this  paper  estimates  small  surface 
patches  separately.  In  a  complete  system,  those  patches 
constituting  the  same  primitive  surface  should  be  grouped 
together.  Segmentation  into  primitive  surfaces  and  opti¬ 
mal  primitive  surface  estimation  can  be  achieved  using 
maximum  likelihood  clustering  as  developed  in  [26]. 

In  the  above  experiment,  the  gradient  descent  method 
requies  about  ten  iterations  for  each  window,  and  it  takes 
less  than  one  second  of  running  time  on  a  Sun-3  com¬ 
puter.  While  choosing  an  alternative  surface  parameter¬ 
ization  and  optimizing  the  codes  can  reduce  the  compu¬ 
tation  time,  improvement  of  the  order  of  more  than  102 
over  sequential  processing  can  be  achieved  with  parallel 
processing  because  the  computation  required  for  each 
pixel  within  the  window  is  independent,  and  can  be  run 
in  parallel  for  these  several  hundred  pixels.  Therefore, 
this  algorithm  is  well  suited  to  real-time  applications 
through  use  of  parallel  processors. 

Most  of  the  surface  patches  seen  in  the  windows  in  im¬ 
age  1  are  estimated  accurately.  There  are  a  few  windows 
for  which  this  does  not  happen,  primarily  because  such  a 
window  sees  portions  of  two  or  more  planar  surfaces,  thus 
violating  the  model  used  which  is  that  each  window  sees 
only  one  surface,  or  because  the  surface  patch  seen  in  the 
window  is  not  seen  in  the  second  image.  However,  these 
mismatches  can  be  detected  by  checking  the  size  of  (14), 
or  by  using  Bayesian  decision  theory.  If  a  window  in  im¬ 
age  1  contains  a  patch  of  constant  image  intensity,  then 
the  system  knows  that  a  good  estimate  of  3-D  surface 
patch  using  only  local  data  cannot  be  obtained,  and  there¬ 
fore  does  not  process  that  window  data  here. 

Initial  experiments  with  real  data  are  illustrated  in  Fig. 
17(a)-(d).  A  242  X  256  pixel  CCD  camera  is  mounted 
on  a  robot  arm.  The  cube  shown  in  dashed  lines  in  Fig. 
17(a)  and  (d)  is  the  calibration  cube  with  12"  side,  used 
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Fig.  17.  (a)  shows  the  relative  positions  between  the  camera  and  the  object 
(the  shaded  rectangular  box);  (b)  and  (c)  are  views  of  the  cube  in  1st  and 
8th  camera  positions,  respectively,  (d)  shows  the  computer  reconstruc¬ 
tion  of  the  3D  surfaces  using  the  estimation  results. 


for  calibrating  the  camera.  It  is  removed  prior  to  taking 
the  image  data.  The  object  to  be  estimated  is  a  rectangular 
box,  as  shown  shaded  in  Fig.  17(a).  The  box  is  slightly 
smaller  than  the  calibration  cube.  Xeroxed  copies  of  jour¬ 
nal  pages  have  been  pasted  on  the  box  to  provide  the  sur¬ 
face  patterns.  The  camera  is  moved  through  a  sequence 
of  eight  positions.  This  trajectory  with  respect  to  the  cube 
geometry  is  shown  in  Fig.  17(a).  The  camera  is  roughly 
r  above  the  box.  Angles  between  the  camera  optical  axes 
in  the  1st  and  2nd  positions,  and  the  2nd  and  3rd  positions 
are  each  slightly  less  than  0.5°.  Angles  between  subse¬ 
quent  pairs  of  positions  are  larger,  and  the  angle  between 
the  optical  axes  in  the  1st  and  8th  positions  is  14°. 

Fig.  17(b)  and  (c)  are  views  of  the  cube  in  1st  and  8th 
camera  positions,  respectively.  The  procedure  here  is 
similar  to  that  for  Figs.  15  and  16,  except  that  here  we 
are  using  real  data  and  the  pinhole  model  (perspective 
projection)  for  the  camera. 

Image  1  sees  only  two  surfaces  of  the  cube.  The  image 
is  divided  into  40  X  40  pixel  windows.  Image  8  shows 
the  windows  that  correspond  to  windows  in  Image  1  based 
on  the  algorithm’s  estimate  of  the  planar  surface  patch  for 


each  window.  Notice  that  most  correspondences  are  good. 
Since  the  window  in  row  4  column  2  in  Image  1  consists 
of  essentially  constant  image  intensity,  i.e.,  no  pattern, 
the  associated  planar  surface  estimate  will  be  poor,  and 
that  can  be  seen  by  looking  at  the  estimate  of  the  associ¬ 
ated  region  in  Image  8.  Also,  for  some  windows  in  Image 
1 ,  the  associated  region  in  the  camera  8  image  plane  will 
be  missing  some  image  data,  i.e.,  these  windows  extend 
beyond  the  view  angle  in  the  8th  camera  position.  One  of 
course  expects  to  have  poor  3-D  surface  estimates  for 
these  regions.  These  phenomena  can  be  seen  in  the  esti¬ 
mated  3-D  surface  patches  in  Figs.  17(d).  Note  also  that 
where  a  window  in  Image  1  views  portions  of  two  sur¬ 
faces  (i.e.,  on  the  boundary  where  two  surfaces  intersect) 
the  estimated  surfaces  have  orientations  between  those  of 
the  true  surfaces. 

E.  Minimization  of  eD(a)  is  Maximum  Likelihood 
Estimation 

In  this  section,  we  show  that  the  a  which  minimizes 
eD(a ),  (14),  is  in  fact  the  maximum  likelihood  estimate 
(mle)  of  a  for  the  probabilistic  model  that  we  now  discuss. 


1040 


IEEE  TRANSACTIONS  ON  PATTERN  ANALYSIS  AND  MACHINE  INTELLIGENCE.  VOL.  11,  NO.  10.  OCTOBER  1989 


Let  r  =  (x,y,z)T,  as  before,  be  the  Cartesian  coordinates 
of  a  point  in  three-dimensional  space.  A  surface  is  defined 
in  general  as  the  set  of  points  whose  coordinates  are  func¬ 
tions  of  two  independent  parameters.  Thus,  the  equations 
of  a  surface  can  be  written  as  x  =  x(rl5  t2 ),  y  =  y(tx,  t2 ), 
Z  =  z(t i,  t2)  where  tx  and  t2  are  the  independent  param¬ 
eters.  In  other  words,  any  point  on  the  surface  is  uniquely 
determined  by  two  numbers,  tx  and  t2 ,  and  we  shall  call  t 
=  ( tx ,  t2 ) T  the  curvilinear  coordinates  of  a  point  on  the 
surface.  Of  course,  the  choice  of  the  curvilinear  coordi¬ 
nate  system  is  not  unique.  Fortunately,  we  do  not  have  to 
compute  it  since  our  algorithm  for  estimating  a  turns  out 
to  be  independent  of  the  choice  of  the  curvilinear  coor¬ 
dinate  system. 

Let  us  choose  an  arbitrary  curvilinear  coordinate  sys¬ 
tem  on  that  surface  which  is  described  by  a.  Let  s  and  u 
be  points  in  image  frames  1  and  2  that  are  views  of  points 
t  and  f',  respectively,  on  the  object  surface.  Then  there 
are  functions  qx(s ,  a)  and  q(u,  b ,  a)  such  that 

t  =  q,(s,  a),  t'  =  q(u,  b,  a), 

s  =  «)>  u  =  q~l(t',  b,  a).  (24) 

(The  notations  qx(s ,  a)  and  q(u,  b ,  a)  differ  because  we 
are  taking  CRF1  to  be  the  world  coordinate  system. ) 

Let  p(t)  denote  the  brightness  of  the  object  surface  at 
point  t.  We  shall  call  /*(  • )  the  surface  pattern.  The  sur¬ 
face  pattern  p(t)  is  seen  in  image  planes  1  and  2  as  px(s) 
and  (jl 2(a),  respectively.  Thus,  fix(s)  =  p(qx(s,  a )),  and 
p2(u)  ~  b,  a)).  In  practice,  what  is  observed  at 

s  and  u  is 

A(s)  =  fMM  +  vt>,(s) 

h(u)  =  n2(u)  +  w2(u )  (25) 

where  w^s),  for  all  s ,  and  w2(w),  for  all  w,  are  zero  mean, 
homogeneous  white  Gaussian  noise  processes  having 
common  variance  a2,  i.e.,  they  are  independent  identi¬ 
cally  distributed  (i.i.d.)  random  variables.  Thus,  px(s)  = 
^ [ /i( -s ) ]  and  p2(u)  ~  E[I2(u)]  where  E[  •  ]  denotes  ex¬ 
pectation  of  the  random  variable  within  the  brackets. 

Let  Dx  and  D2  denote  regions  in  image  frames  1  and  2, 
respectively,  and  let  Ix  and  let  /2  be  vectors  with  compo¬ 
nents  Ix{s),  s  e  Dx,  and  I2(u ),  u  e  D2,  respectively. 
Hence,  Ix  and  I2  represent  the  picture  functions  over  re¬ 
gions  Dx  and  D2  in  image  frames  1  and  2,  respectively. 
Let  p  denote  a  vector  with  components  /x(/)  where  the  t 
are  points  seen  in  Dx  or  D2  or  both.  Then  the  joint  like¬ 
lihood  of  Ix  and  /2  given  a,  a2,  and  fi(t)  for  all  t ,  is 

P(h,  h\a,  fi,  a2) 

=  (2ira2)  _<il/2  exp  ^  [A(s)  “  ^i(*)fj 

•  (2  7r<r2)"*/2  exp)  S  [/2(«)  -  ii2{u)f ( 

l  LO  ueDi  J 


ject  surface  is  an  a  priori  unknown  nonrandom  function, 
so  that  px(s)  =  p(qx(s,  a))  and  p2(u)  =  p(q(u ,  a)) 
are  a  priori  unknown  nonrandom  functions.  Hence,  the 
joint  likelihood  (26)  depends  not  only  on  a,  but  also  on 
a2  and  p{t)  for  all  t.  We  seek  the  maximum  likelihood 
estimate  a  for  the  unknown  a.  Unfortunately,  this  requires 
the  simultaneous  computation  of  the  maximum  likelihood 
estimates  p(t)  and  b1  for  the  unknowns  portions  of  p{t) 
seen  in  the  images,  and  for  a2. 

The  necessity  for  maximum  likelihood  estimation  of 
p(t),  for  those  t  seen  in  Dx  or  D2  or  both,  in  order  to 
realize  maximum  likelihood  estimation  of  a,  is  interest¬ 
ing.  In  this  paper,  p(t)  is  considered  to  be  completely 
arbitrary,  but  in  most  applications  there  is  some  restric¬ 
tive  model  for  the  surface  pattern.  Some  examples  of  re¬ 
stricted  models  are  one  or  more  parameterized  curves,  a 
polynomial  intensity  function,  a  locally  homogeneous 
parameterized  stochastic  process,  or  some  other  parame¬ 
terized  model.  In  all  cases,  the  pattern  model  parameters 
must  be  estimated  jointly  with  the  estimation  of  a.  Recent 
examples  of  this  in  the  literature  are  [28],  [29].  In  [28], 
contours  in  the  image  (i.e.,  curves  across  which  the  image 
intensity  changes  rapidly)  are  modeled  as  polynomial 
curves  with  a  priori  unknown  coefficients,  and  the  data 
used  with  these  models  are  not  the  original  images  but 
rather  the  edge  maps  for  two  or  more  images.  In  [29], 
MRF  models  are  used  for  textured  patterns  on  locally 
planar  3-D  curved  surfaces. 

For  each  surface  point  t  seen  in  image  1  or  image  2,  the 
maximum  likelihood  estimate  of  p(t)  given  a  is  the  jl(t) 
that  maximizes  (26).  Let  u  =  h(s,  b,  aT).  Then,  p(t)  is 
found  to  be  the  following. 

1)  If  surface  point  t  is  seen  at  s  in  Dx  but  not  in  Z)2, 
then  £(0  =  £,(*)  =  /,(*). 

2)  If  surface  point  t  is  seen  at  u  in  D2  but  not  in  Dx, 
then  £(<)  =  /j2(m)  =  /2(«). 

3)  If  surface  point  t  is  seen  in  both  frames  at  s  e  Dx  and 
u  e  D2,  respectively,  then 

A  /  A  /  X  +  h(u) 

MO  =  pM  =  P2{u)  = - - - . 

Let  Dx2  be  a  subset  of  Dx  such  that 

D12  =  js:  s  e  Dx,  and  u  =  h(s,  b ,  z(s ,  a))  e  D2 J , 

i.e.,  the  subset  of  points  in  Dx  in  image  plane  1  that  are 
views  of  surface  points  also  seen  in  D2  in  image  plane  2. 
(Note  that  Dx2  is  a  function  of  a.)  Replacing  /x(  • )  in  (26) 
by  its  maximum  likelihood  estimate  * )  obtained  above, 
we  have 

p(Ix,  I2 1  a,  /x,  a2) 


=  (2tto2) 


-{d\+d2)/i 


■  exp 


1 

2a2 


•  S 

seD\2 


Ijjs)  +  72(h) 

2 


2 


(26) 

where  dx  and  d2  are  the  numbers  of  pixels  in  Dx  and  D2, 
respectively.  We  assume  here  that  the  pattern  on  the  ob- 


+ 


7i(s)  +  /2(n) 
2 
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2\-(<h+<k)/2 

=  (lira  ) 

■exp[“i^[/l(s)_/2(“)]2]  (27) 

where  u  =  h(s,  b,  z(s,  a)).  Therefore,  maximization  of 
the  joint  likelihood  function  /?(/,,  /2|a,  p,  a2)  with  re¬ 
spect  to  a  is  simply  minimization  of  eD (a)  in  (14)  with  D 
=  Dn.  Thus,  the  a  that  minimizes  eD {a)  is  the  maximum 
likelihood  estimate  of  a  when  the  object  surface  pattern 
p  ( • )  is  treated  as  an  a  priori  unknown  deterministic  func¬ 
tion.  (As  can  be  seen  in  (27),  it  is  unnecessary  to  estimate 
a2  in  order  to  estimate  a.) 

We  run  into  a  problem  here  due  to  the  quantization  of 
the  image  into  pixels.  From  Fig.  14,  we  see  that  the 
marked  region  of  object  surface  is  seen  as  two  pixels  cen¬ 
tered  at  u  and  u'  in  image  2,  and  one  pixel  centered  at  s 
in  image  1 .  More  generally,  a  region  on  the  object  surface 
may  be  seen  as  a  pixel  at  s  in  set  Dx  and  as  pixels  at 
u2 ,  •  *  • ,  i*„  in  set  D2.  In  that  case,  (27)  is  not  completely 
correct,  and  a  more  exact  formulation  is  given  in  Section 
II-F.  However,  in  practice  the  use  of  (27)  for  maximum 
likelihood  estimation  works  well,  and  we  use  it. 


to  obtain  the  mle  of  a,  is  (30).  Equation  (30)  is  interesting 
for  two  reasons.  The  first  is  its  simplicity  and  that  it  is 

-  (l /2a2)  [/,($)  -  Iz(u)]2  when  n  =  1.  The  other  rea¬ 
son  is  that  Ifs)  contributes  only  once  to  the  sum  that  must 
be  maximized  rather  than  n  times  as  would  be  the  case  if 

-  ( 1  /2a2)  •  E"=1  [I\(s)  -  I2 (w)]2  were  used. 

Now,  suppose  a  small  region  on  the  object  surface  is 
seen  as  the  pixels  at  S\,  s2,  *  *  *  ,  sn  in  Dx  and  the  pixel  at 
u  in  D2  where  h(si9  b ,  a)  «  u.  Then  preceding  analo¬ 
gously  to  the  preceding  paragraph,  we  have 

Az(«)  =  Uh~'(u’ b' a))  + 

and 

Ai(*i)  =  A(*/)  -  ^  h{h~\u,  b,  a))  +  I2(u) 

where  Jl(h~\u7  b ,  a))  denotes  (1/n)  E"=1  IfSj).  Also, 
upon  replacing  p2(u)  and  the  pfsi)  by  their  mles,  the 
contribution  to  the  exponent  of  (26)  becomes 

-(1/2 a2)  [/,(*-'(«,  b,  a))  -  I2(u)f  (31) 


F.  A  More  Exact  Expression  for  p(lu  J2\  a,  p,  o2) 

Suppose  a  region  on  the  object  surface  is  seen  as  a  pixel 
at  s  in  set  Dx  and  as  pixels  at  u{,  a2,  •  •  •  ,  un  in  set  D2. 
Then  the  contribution  to  the  exponent  in  (26)  of  the  data 
at  these  pixels  is 

-(l/2a2)  f  [/,(*)  -Mf 


Using  results  (30)  and  (31),  we  see  that  a  more  accurate 
expression  for  the  left  side  of  (27)  is 

P{h ,  h\a,  £,  a2) 


2K~{d\+d2)/2 

=  (27ra  )  ex 

-  I2{h{s,  b,  a))]2 


Pi(«) 


+  S  [l2(Ui)  -  M2(«/)f  j  (28) 

where  px(s)  —  (1/n)  E"=1  p2 («/).  This  relationship  be¬ 
tween  pi(s)  and  the  p2 («,)  follows  from  the  assumption 
of  Lambertian  image  formation,  i.e.,  an  incremental  re¬ 
gion  of  the  object  surface  is  seen  with  the  same  brightness 
at  points  in  both  images.  Then  the  mles  for  pfs)  and 
Pi(Ui),  i  =  1,  •  •  *  ,  n,  are  obtained  by  maximizing  (28) 
with  respect  to  pfs)  and  the  p2 (m,)  subject  to  the  con¬ 
straint  among  these  variables.  The  mles  are 

Ai(s)  =  h(h(s,  b,  a))  +  /,(*), 

and 

A z(«i)  =  h(u,)  ~  ^-Lj-72(*(s,  b,  a))  +  I ,(s) 

(29) 

where  l2(h(s,  b,  a))  denotes  (1/n)  £?=1 12(Ui).  Upon  re¬ 
placing  p\(s)  and  the  p2(ui)  in  (28)  by  their  mles,  (28) 
becomes 

-(1/2 a2)  [/,(«)  -  l2{h(s,  b,  a))]2.  (30) 

Hence,  the  contributions  of  Ifs)  and  /2(«f),  i  —  1, 
•  •  •  ,  n,  to  the  function  that  must  be  maximized  in  order 


+  2  [li(h  ‘(m,  b, 

ueh(Di2 -Dl2,b,a) 


(32) 

where  D[2  is  the  set  of  points  s  such  that  s  e  Dl2  and  the 
pixel  at  s  maps  onto  one  or  more  pixels  in  D2.  Similarly, 
D\2  —  £>i2  is  the  set  of  points  s  belonging  to  Dl2  and  not 
D j2,  and  h(Dn  -  D[2,  b,  a)  is  the  set  of  points  u  such 
that  the  pixel  at  u  corresponds  to  pixels  at  two  or  more  s 
in  Dn.  Now  maximization  of  (32)  with  respect  to  a  is 
minimization  of  the  sum  of  the  two  summations  in  the 
exponent. 

There  is  still  a  source  of  inexactness  in  the  use  of  (32) 
for  representing  the  left  side  of  (27).  It  is  that  one  pixel 
in  Dj  will  never  map  onto  an  integer  number  of  pixels  in 
D2  and  vice  versa.  One  result  of  this  is  that  even  if  a2  = 
0  and  the  true  value  of  a  is  used,  the  exponent  of  (32)  will 
not  be  zero.  This  effect  is  equivalent  to  adding  colored 
noises  (i.e.,  nonwhite  noises)  to  the  two  images.  The 
standard  deviation  of  the  quantization  noise  at  a  pixel  in 
image  k  will  be  proportional  to  the  gradient  of  pk  ( • )  at 
the  pixel.  When  a2  is  small,  this  colored  noise  has  a 
greater  effect  than  does  the  white  noise  assumed  in  this 
model. 

In  summary,  if  p(/j,  /2|a,  p,  a2)  is  to  be  maximized 
with  respect  to  a ,  (32)  should  be  used.  However,  the  ap- 
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proximation  (27)  is  computationally  simpler  and  works 
well. 


III.  Cramer-Rao  Lower  Bounds  on  the  Error 
Covariance  for  the  Estimates  of  Surface 
Parameters  a 

In  this  section,  we  derive  lower  bounds  for  the  covari¬ 
ance  matrices  for  the  a  priori  unknown  parameters  of  the 
3-D  objects  and  the  object  surface  patterns  for  those  sit¬ 
uations  where  the  parameters  to  be  estimated  are  treated 
as  unknown  constants  or  as  random  variables.  Note  that 
these  are  fundamental  bounds  that  depend  on  the  raw  im¬ 
age  data  and  any  a  priori  information  concerning  camera 
and  object  surface  geometry.  No  matter  what  object  pa¬ 
rameter  estimation  algorithms  are  used,  the  resulting  ob¬ 
ject  parameter  estimation- errors  can  never  be  smaller  than 
these  bounds.  Furthermore,  if  the  image  data  windows 
used  contain  many  pixels  and  the  object  surfaces  being 
estimated  are  smooth,  then  maximum  likelihood  parame¬ 
ter  estimators  will  have  error  covariance  matrices  given 
approximately  by  the  C.R.  Bound.  These  bounds  require 
computing  the  Fisher  Information  Matrix  [13].  We  com¬ 
pute  this  matrix  and  the  error  covariance  bound.  The  final 
results  are  given  in  (37)  and  (41). 

Let  a7  =  (  p7,  a2,  a7).  Here,  p7  is  a  vector  having 
components  p(t)  where  /  is  a  point  in  the  2-D  parametric 
space  on  which  the  object  surface  pattern  is  specified  as 
discussed  in  Section  II-E  and  (24) .  The  points  t  in  ques¬ 
tion  are  those  specifying  3-D  surface  points  seen  in  set  D{ 
or  set  D2  or  both,  where  Dx  and  D2  are  arbitrary  sets  in 
the  image  planes  in  CRF1  and  CRF2,  respectively.  As 
discussed  in  Section  II-E  and  (24),  through  qx\t,  a)  and 
b,  a)  point  t  maps  to  point  s  e  Dx  in  image  1  or  u 
e  D2  in  image  2,  or  both.  Then  the  Fisher  Information 
Matrix  is3 


F  =  —E 


=  E 


a2  In  p(/|,  /2 1  a r) 

(dar) 

d  Inpjlu  /2 1  a r) 

6a  t 


d  In  p(I\,  12 |ar) 


6a  7 


(33) 


where  a  T  is  the  true  value  of  a,  6  In  p  ( • ,  •  |  a  T) /da  T  is 
a  row  gradient  vector  evaluated  at  a  =  a  T,  and  62  In  p  ( • , 
•  |ar)/(6ar)2  is  a  2nd  partial  derivative  matrix  evalu¬ 
ated  at  a  =  a  T.  Let  A  be  the  error  covariance  matrix  for 
any  unbiased  estimator  for  a  based  on  data  in  Dx  and  D2. 
Then  the  Cramer-Rao  bound  is 


F~x  <  A  (33a) 

where  by  the  inequality  we  mean  that  A  —  F~]  is  non¬ 
negative  definite.  This  implies  that  if  v  is  the  parameter 
estimation  error  vector,  then 


trace  F  1  <  E[ vTv]  =  E 


2  (Vif 


=  2  E[(Vif]. 


3The  following  notation  is  used.  For  any  function  >v(a),  by  dw(aT)/ daT 
we  mean  dw(a)/da  |a  =  ar 


Furthermore, 

ii th  element  of  F~l  <  E[(vi)z], 

where  vt  is  the  ith  element  of  v .  Similar  results  apply  when 
it  is  a  biased  estimate  of  aT  [13].  Let  F~ 1  denote  the  di¬ 
agonal  subblock  of  F~l  that  applies  to  a  alone. 

We  define  p„(  • )  to  be  the  image  mean  value  function 
in  CRFn.  We  now  explore  in  detail  the  properties  of  the 
Fisher  information  matrix,  (33).  Let  u  and  s  be  related  by 
u  =  h(s,  b,  z(s,  aT )).  We  assume  that  t  =  qx(s,  a)  and 
t  =  q(u,  b,  a)  are  1 : 1  functions  on  Dx  and  D2 ,  respec¬ 
tively.  If  they  are  not  1:1,  then  two  or  more  points  on  the 
object  surface  project  onto  a  single  pixel  in  image  plane 
1,  or  onto  a  single  pixel  in  image  plane  2,  or  this  can 
happen  with  both  image  planes.  In  such  a  case,  we  use 
only  one  of  these  surface  points  in  the  following  devel¬ 
opment,  and  that  is  the  surface  point  that  projects  closest 
to  the  center  of  the  pixel  onto  which  these  surface  points 
project.  From  (24)  and  using  the  reasoning  leading  to  (26), 
we  have 

P(lu  h\ «,  P,  <r2) 


=  {lira1)  ~d'/2  exp  ,g,  [MO  ~  MO]*  j 

•  (2ir ff2)”*/2  exp  [M«0  -  MO]2  j- 

(26a) 

If  s  and  u  see  the  same  point,  specified  by  t,  on  the  object 
surface,  then  px (s)  =  p (t)  =  p2(u). 

The  exact  meaning  of  (26a)  requires  some  explanation. 
As  explained  in  Section  II-E,  the  pixel  at  s  in  D{  may  see 
that  portion  of  object  surface  seen  by  n  pixels  ux,  u2, 
*  •  •  ,  un,  in  D2.  This  dependence  of  the  value  of  px(s) 
on  the  values  of  p2(  • )  and  p(  • )  at  a  few  values  of  their 
arguments  poses  a  complication  when  trying  to  find  a  sim¬ 
ple  form  for  the  dependence  of  F~l  in  (33a)  on  camera 
geometry,  object  surface  parameterization,  and  object  re¬ 
flectivity  coefficient  pattern.  (In  theory,  F~x  in  (33a)  can 
be  computed  for  the  sensor  model  described. )  To  arrive 
at  a  sensor  model  that  is  easier  to  work  with,  we  assume 
that  the  noiseless  image  gray  level  at  pixel  s  in  Dx  is  due 
to  the  sensor  illumination  at  s,  the  center  of  the  pixel, 
only.  Then  there  is  only  one  of  the  u[s,  i  =  1,  •  •  •  ,  n, 
for  which  p2(Uj)  *  px (s),  which  is  the  m,  closest  to  h(s, 
b ,  z(s,  aT)).  Similarly,  there  is  only  one  /x(^)  for  which 
p(tj)  *  px(s),  namely,  the  ti9  i  =  1,  *  *  *  ,  n’ ,  that  is 
closest  to  qx(s,  a).  The  other  p2{ui)  and  p(tt)  are  not 
constrained  to  being  the  same  as  any  of  the  px (s'),  s'  e 
Dx.  A  similar  statement  applies  when  one  pixel  in  D2  sees 
a  region  of  object  surface  seen  by  n  pixels  in  Dx .  This  is 
the  model  that  we  now  use  in  this  section  and  Sections 
III- A,  B,  and  C.  (A  different  model  may  give  slightly  dif¬ 
ferent  results.) 
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From  (24),  we  have 


d^iO) 

da 

da 


d/*(giO.  fl)) 

da 


fy(<)  fyi(*»  «) 

dt  da 


t  =  q\(s,a) 


dn(q(s,  b,  a)) 


da 

dfx(t)  dq(u ,  b,  a) 
dt  da 


t  =  q(u,b7a) 


(34) 


Note,  in  the  above,  dq}(s,  a) / da  is  a  2  X  matrix  where 
£  is  the  number  of  components  in  a.  Similarly  for  dq(u , 
b ,  a)/ 5a. 

Define  D12  as  the  set  of  s  such  that  s  E  D\  and  s  belongs 
to  one  of  the  following  sets:  the  pixel  at  s  maps  onto  one 
or  more  pixels  in  D2;  or  two  or  more  pixels  in  Dx  map 
onto  the  pixel  at  u  in  D2,  and,  of  these  pixel  centers,  s  is 
the  one  for  which  h(s ,  b ,  z(s,  a))  is  the  closest  to  u. 

From  (26a)  and  (33),  following  some  substantial  work, 
we  arrive  at  (details  are  in  Appendix  Bl): 

T 


dn{qi(s,  aT))  dn(q(u,  b,  aT )) 

2  /  seD\2 

daj  daj 

dp(q]{sy  aT))  dp(q(u ,  b2 ,  aT)) 
daT  daT 


(35) 

Note  that  due  to  cancellations,  only  the  subset  Dn  of  Dx 
contributes  to  the  final  expression  for  the  bound  (35).  This 
is  as  expected  since  data  points  cannot  contribute  to  the 
estimation  of  a  unless  there  is  a  point  in  each  image  that 
is  due  to  the  same  point  on  the  object  surface.  The  right 
side  of  (35),  the  Cramer-Rao  Bound  for  the  estimation 
error  for  the  parameter  vector  aT ,  is  interesting.  Note  that 
the  Cramer-Rao  (C.R.)  Bound  is  inversely  dependent  on 
th  differences  in  the  gradient  vectors  dfxl(s)/daT  = 
dp(q{(sy  aT))/daT  and  dp2(u)  /  daT  =  dp(q(uy  b , 
aT))/daTy  i.e.,  the  gradients  with  respect  to  a  of  the  im¬ 
age  mean  values  taken  by  cameras  1  and  2,  respectively. 
Equivalently,  upon  using  (34),  this  difference  can  be  writ¬ 
ten  as 


d^[(s)  _  dfi2(u) 
daT  daT 


dt 


dqijs,  Oj)  _  dqju ,  b,  ar) 
daT  daT 


(36) 


Hence,  in  order  to  have  a  small  C.R.  bound,  one  wants 
the  difference  in  the  rate  of  change  of  t  with  respect  to  aT 
in  CRF1  and  CRF2  to  be  as  large  as  possible,  i.e.,  one 
wants  the  columns  of  the  matrix  in  the  brackets  in  (36)  to 
be  as  large  as  possible. 


A,  Cramer-Rao  Lower  Bounds  when  Using  an 
Alternative  Pattern  Parameterization 

It  is  often  convenient  to  take  the  a  priori  unknown  pat¬ 
tern  to  be  p\(s),  the  mean  value  function  for  image  1,  or 


p2(u)  rather  than  p{t).  There  are  two  reasons  for  this. 
First,  since  the  data  sets  that  we  are  dealing  with  are  lx 
and  /2,  the  easiest  pattern  parameterization  to  work  with 
is  often  the  mean  value  function  associated  with  one  of 
these  sets.  Second,  though  in  theory  we  can  always  spec¬ 
ify  an  object  surface  pattern  as  a  2-D  parameterization  on 
the  object  surface,  in  practice  this  may  be  messy.  From  a 
physical  point  of  view  there  is  a  slight  difference  between 
assuming  the  true  fixed  pattern  is  p(t)  and  assuming  it  is 
If  one  lets  a  vary,  then  in  the  first  case  the  mean 
value  function  for  the  first  image  varies  at  point  s  because 
it  is  given  by  ^(^(s,  a)),  i.e.,  because  the  obj ect  surface 
moves.  However,  in  the  second  case  p\(s)  is  fixed  and 
does  not  vary  with  a,  but  the  pattern  at  a  point  on  the 
object  surface  will  vaiy  as  a  varies.  In  general,  if  /^(s) 
is  taken  to  be  the  a  priori  unknown  pattern  parameters, 
then  some  of  the  elements  in  the  resulting  Fisher  Infor¬ 
mation  Matrix  will  differ  from  those  in  (blO).  However, 
an  interesting  result  easily  proven  using  the  types  of  iden¬ 
tities  developed  in  Appendix  B2,  is  that  the  C.R.  Bound 
for  the  a  vector  will  be  the  same  whether  the  unknown 
pattern  vector  is  taken  to  be  p\(s)y  or  p2(u)y  or  p(t). 
Since  the  theory  is  a  little  easier  to  interpret  for  p\(s)  [or 
p2(u)]  as  the  unknown  pattern  parameters,  the  remainder 
of  our  development  of  the  C.R.  Bound  is  for  this  case. 

Hence,  let  p\(s)y  s  e  DXy  be  the  a  priori  unknown  pat¬ 
tern  parameters.  Then  in  Appendix  B2  it  is  argued  that 
the  C.R.  Bound  for  aT ,  F~\  is  given  by 

2aS  £  d^jhjs,  b,  ar)) 

/  seDn  daj 

(37) 


dix,2{h(s,  b,  aT)) 


dar 


where  Dn  is  defined  before  (35).  (See  [17]  for  details.) 
Note  that  the  true  unknown  pattern  parameters  are  as¬ 
sumed  to  be  the  p\(s)  for  all  s,  but  it  is  convenient  to 
express  the  bound  in  terms  of  p2(u)  where  p2(h(s ,  by 
ar))  =  ^i(*y)« 

To  explore  (37)  further,  observe  that  it  can  be  ex¬ 
pressed  as 


24  S 

/  seDn 


dh(s,  b,  z(sy  aT)) 


da-r 


tyijun  ( d^20) 


du  I  V  du 


dh(sy  by  z{s,  aT)) 


daT 


/u  =  h(s,b7z(s,aT)) 


(38) 


From  (16), 


dh(s,  b,  z{s,  aT))  dz(s,aT) 

c\2  47  »  (39) 


daj 


daT 
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so  that 


fy2(“)  *(*■  ar)) 

bu  daT 


u  =  h(s,b,z(s,aT )) 


bu 


C12 


bz{s,  aT) 
bu  j 


(40) 


Then  (38)  is 


Fa'  =  2  o\ 


s 

^eDi2 


^2(«) 

bu 


C\2 


( 3z(s,  flr)\ 

\  3ar  J 


( 3z(s,  or) 

\  baT 


-l 


(41) 


Expressions  for  bz(s,  a)/ba  are  derived  and  given  in 
Section  II- A  for  the  sphere  and  Appendixes  C,  D,  and  E 
for  the  plane,  cylinder,  and  general  quadric,  respectively. 
By  assuming  a  deterministic  or  a  stochastic  model  for 
Pi(s),  it  is  sometimes  possible  to  obtain  a  closed  form 
expression  for  (41).  An  example  of  this  is  given  in  Section 
III-D . 


B.  Interpretation  of  the  Cramer-Rao  Bound 

The  C.R.  Bound  (41)  has  a  very  simple,  physically 
meaningful  interpretation.  First,  observe  that  the  depen¬ 
dence  of  the  bound  on  each  of  camera  geometry,  object 
surface  pattern,  and  object  surface  shape  is  immediately 
seen.  Specifically,  the  two-component  vector  c12  is  the 
projection  in  CRF2  of  a  unit  vector  in  CRF1  at  the  origin 
in  the  direction  of  the  £-axis.  It  represents  the  influence 
of  the  two-camera  geometry  on  the  Bound.  The  vector 
bp 2(w)/dw  is  the  gradient  of  the  mean  value  of  image  2 
intensity  function.  It  represents  the  contribution  of  the  ob¬ 
ject  surface  pattern  to  the  bound.  Finally,  the  vector  bz(s , 
a)/ba  is  the  contribution  of  the  object  surface  shape  to 
the  Bound.  It  represents  the  dependence  of  object  surface 
shape  on  the  parameter  vector  a. 

We  develop  the  interpretation  of  (41)  and  (38)  further. 
Note  that  because  of  the  use  of  the  orthographic  projec¬ 
tion,  cX2  gives  the  direction  of  all  epipolar  lines  in  CRF2. 
(By  an  epipolar  line  we  mean  the  following.  Each  point 
in  image  1  is  the  image  of  a  point  on  a  3-D  surface.  The 
ray  that  goes  through  a  3-D  surface  point  and  its  image 
point  in  image  1  is  seen  as  the  so-called  epipolar  line  in 
the  image  plane  of  camera  2.  Hence,  the  point  in  image 
2  that  is  the  image  of  the  3-D  surface  point,  must  lie  on 
this  epipolar  line.)  The  magnitude  of  cl2  varies  as  the  sine 
of  the  angle  between  the  optical  axes  of  the  two  cameras. 
We  see  that  this  will  be  very  small  for  optical  axes  that 
are  almost  parallel,  and  is  a  maximum  for  optical  axes  that 
are  orthogonal. 

The  partial  derivative  bz(s,  a) /ba(  is  the  rate  of  change 
with  respect  to  parameter  at  of  the  z  component  of  the 
3-D  surface  point  (sT,  z)  in  CRF1 .  (It  is  the  rate  of  change 
with  respect  to  at  of  distance  to  the  object  surface  at  point 
( sT ,  z).)  Hence  |  cnbz(s,  a)/da,|  is  the  magnitude  of  a 


directional  derivative— the  rate  of  change  with  respect  to 
at  of  the  image  in  the  camera2  image  plane  of  the  point  at 
(sT,  z(s,  a))  on  the  3-D  object  surface. 

Since  bp2(u)/bu  is  the  gradient  of  image  2  intensity, 
we  see  that  [bp2(u) / bu]  cl2[bz(sy  a)/bat]  is  the  rate  of 
change  with  respect  to  a,  of  the  intensity  in  image  2  in  the 
direction  of  the  vector  cl2  at  point  u  —  h(s ,  b ,  z(s,  a)). 
Because  of  the  inverse  operation  in  (41),  we  make  the 
qualitative  statement  that  the  larger  these  directional  de¬ 
rivatives  are,  the  smaller  will  be  the  covariance  matrix  for 
the  estimation  error  for  aT. 

From  the  preceding,  it  is  clear  that  maximum  parameter 
estimation  accuracy  is  obtained  for  the  following  condi¬ 
tions.  The  mean  value  function,  p2(u ),  of  the  image  of 
the  object  surface  pattern  should  be  rapidly  varying  so 
that  b\i2(u)/bu  is  large.  The  angles  between  the  optical 
axes  should  be  large  (90°  is  the  best)  in  order  that  cn  be 
large,  and  the  angle  between  the  image  intensity  gradient 
and  the  direction  of  the  epipolar  lines  should  be  small. 
Then  cn[bp2(u)  /  bu]  will  be  large.  Last,  it  is  desirable 
that  the  gradients  bz(s ,  aT)/baT  be  large  and  that  their 
directions  be  distributed  over  the  entire  space  as  s  varies 
throughout  Dn.  In  general,  these  conditions  are  increas¬ 
ingly  better  achieved  with  increasing  D12,  i.e.,  increasing 
patch  size.  If  there  is  flexibility  to  choose  the  direction  for 
at  least  one  camera  axis  during  the  surface  estimation 
procedure,  the  axis  should  be  chosen  such  that 
[bp2(u)/bu]Cn  is  large  for  most  u  in  h(Dn ,  b,  a).  The 
choice  can  be  made  more  specific  depending  on  the  situ¬ 
ation  encountered. 


C.  Another  Interpretation  Associated  with  the  Fisher 
Information  Matrix 

Recall  that 

e6n{a)=  ?  [/,(»)  -  I2{h{s,  b,  z(s,  a)))]'.  (14) 

seZ>i2  L 

Assume  that  the  additive  noise  is  0,  so  that  Ifs)  =  pfs) 
and  I2(u)  =  p2(u).  Expanding  the  resulting  function 
ebl2(a)  about  the  point  aT  in  a  Taylor  series  up  through 
quadric  terms  gives  us 


eb  12(«)  *  e'0n(aT)  + 


debn(aT) 

baT 


(a  -  aT) 


+  \  -  aT)T | 


52^,2(a7-)\ 

(darf  ) 


(a  —  aT). 


(42) 


But  ebn(aT)  =  0,  and  rde'c,n(aT) / daT  =  0.  Furthermore, 

T 

ffbJ^r)  _  ^  / d^jhjs,  b,  ar))\ 

( daTf  s*B'i  l  daT  ) 


1 9|a2(fe(s,  fr,  arf 
l  baT 


(43) 
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Hence, 


ebn(a)  ®  2  (a  -  aT)‘ 

seD\ 2 


bp2(h(s,  b ,  aT )) 


/ bp2(h{s,  b ,  ar)) 

1  daT 


baT 


( a  -  aT). 


(44) 


Recall  from  (37)  that  the  inverse  of  (43)  multiplied  by 
c jj  is  the  C.R.  Bound  for  the  achievable  error  in  estimat¬ 
ing  aT.  From  (44),  we  see  that  (43)  also  determines 
ebnia).  The  function  e’^fa)  is  narrow  if  and  only  if  (43) 
is  large.  Hence,  we  see  that  the  C.R.  Bound  is  small  if 
and  only  if  e£12(a)  is  a  narrow  function. 


D.  An  Example  of  a  Numerical  Computation  of  the 
Cramer-Rao  Bound 

Some  additional  insight  is  provided  by  a  simple  exam¬ 
ple  for  (41).  Let  the  3-D  surface  be  the  plane 


the  variance  of  the  error  in  estimating  plane  orientation 
decreases  as  the  square  of  this  rate,  because  the  further 
away  the  image  pixels  are  from  the  patch  center,  the  more 
they  contribute  to  the  plane  orientation  estimation  accu¬ 
racy.  We  feel  that  the  greatest  value  of  the  expressions 
for  the  C.R.  Bound  is  a  feeling  for  how  the  parameter 
estimation  error  depends  on  the  object  surface  pattern  and 
the  camera  and  object  geometry.  However,  it  may  also 
be  of  interest  to  look  at  numerical  examples.  Hence, 
consider  \dp2(u)/du\  =  3,  o\  =  4,  angle  between 
(dp2(u)/du)T  and  cx2  equal  45°,  and  |  c12 1  —  0.17  (cor¬ 
responding  to  an  angle  of  10°  between  the  optical  axes  of 
the  two  cameras ) .  Note  that  aT  is  due  to  the  additive  noise 
in  our  models,  but  in  practice  it  can  account  for  quanti¬ 
zation  and  other  numerical  errors  in  the  entire  measure¬ 
ment  system.  Then  for  N  =  16  pixels,  we  have  6.5  < 
Varp0>  and  0.043  <  Varp!  and  Var  p2.  Hence,  the  lower 
bounds  for  the  standard  deviations  of  the  slopes  of  the 
plane  in  the  vertical  or  the  horizontal  directions  are  0.21. 


Z  =  Po  +  P\S  1  +  Pls2 
and  let  p2(u)  be  linear, 

Pi{ti)  =  7o  +  7i«i  +  72^2 

where  (s1#  s2)T  =  s  and  (uu  u2)T  =  u.  (This  choice  for 
p2(u)  means  that  /x,(s)  will  also  be  linear.)  Then 
bp2(u)/bu  =  (7i,  y2),  so  that  [dp2(u)/du]  cl2  is  a  con¬ 
stant  independent  of  w,  or,  equivalently,  of  s  where  u  = 
h(s ,  b ,  z(s,  aT))\  it  can  therefore  be  taken  out  of  the 
summation  in  (41).  Since  dz(s,  aT)/baT  =  (1,  su  s2), 
we  see  that  (41)  is 


IV.  Conclusions 

A  parametric  modeling  and  statistical  estimation  ap¬ 
proach  has  been  proposed  and  simulations  shown  for  es¬ 
timating  3-D  object  surfaces  from  images  taken  by  cali¬ 
brated  cameras  in  two  positions.  The  parameter  estimation 
suggested  is  gradient  descent,  though  other  search  strat¬ 
egies  are  also  possible.  Processing  image  data  in  blocks 
(windows)  is  central  to  our  approach.  The  estimation  is 
estimation  of  patches  of  3-D  surface  by  searching  in  pa¬ 
rameter  space  to  simultaneously  determine  and  use  the  ap- 


r 


24 


d^2  («) 

du 


V. 


N2  X-  (N  -  1  )N2 

1  )N2  1  N-\  (. N 

]-  (N  -  1  )N2  i  (N  -  1)V 


\(N~  1  )N2 


1  )N2 


1 

4 

1 

3 


(N  -  1  )V 

n~\ 


3 


-1 


l)N 2 


J 


where  we  have  used  Sf="o'  *  =  (1/2)(N  -  1)JV  and 
Ef=_o'  i2  =  (1  /3 )(N  -  1  )(N  -  (1/2 ))N.  Upon  carry- 
ing  out  the  matrix  inversion  and  then  keeping  only  the 
largest  power  of  N ,  we  have  the  approximation 


2(7^ 

IN ~2 

-6AT2 

-6AT3 

(d^(u)  \2 

—6N~3 

12  AT4 

0 

l\  du  12j  J 

1 - 

1 

1 

0 

12J\T4_ 

(41a) 

for  the  C.R.  Bound  for  E[(a  -  aT)(d  -  aT)7].  This 
becomes  the  exact  bound  as  N  becomes  large.  The  diag¬ 
onal  elements  of  (41a),  starting  with  the  top,  are  the 
variances  of  the  errors  in  estimating  p0,  p1?  and  p2,  re¬ 
spectively.  We  see  that  the  variance  of  p0,  the  position  of 
the  3-D  plane,  is  inversely  proportion  to  image  patch  size, 
the  number  of  pixels  used  in  the  stereo  estimation.  But 


propriate  pair  of  image  regions,  one  from  each  image,  and 
to  use  these  for  estimating  a  3-D  surface  patch.  Though 
the  choice  of  performance  functional  was  motivated  by 
consideration  of  engineering  reasonableness,  we  derive 
the  expression  for  the  joint  likelihood  of  the  two  images, 
and  show  that  the  algorithm  is  a  maximum  likelihood  pa¬ 
rameter  estimator,  and  thus  enjoys  the  desirable  estima¬ 
tion  accuracy  properties  of  maximum  likelihood  esti¬ 
mators.  A  very  important  concept  arising  in  the  maximum 
likelihood  estimation  of  3-D  surfaces  is  that  the  patterns 
on  the  3-D  surfaces  must  also  be  modeled  and  estimated. 
We  do  this  for  the  case  of  completely  arbitrary  patterns, 
in  this  paper,  and  deal  with  restricted  pattern  classes  else¬ 
where  [28].  Finally,  Cramer-Rao  Lower  Bounds  are  de¬ 
rived  for  the  covariance  matrices  for  the  errors  in  esti¬ 
mating  the  a  priori  unknown  object  surface  shape 
parameters.  No  surface  reconstruction  algorithm  can  be 
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more  accurate  than  these  bounds,  but  the  accuracy  of  our 
maximum  likelihood  estimator  approximates  these  bounds 
as  the  size  of  the  image  patches  used  becomes  large. 

Following  are  a  few  points  that  may  answer  some  ques¬ 
tions  occurring  to  a  reader. 

1)  In  Appendix  A,  we  derive  the  orthographic  projec¬ 
tion  approximation  to  the  more  exact  perspective  projec¬ 
tion.  It  can  be  made  to  be  arbitrarily  accurate  by  applying 
it  to  suitably  small  3-D  regions— a  different  approxima¬ 
tion  for  each  such  region.  Our  primary  purpose  in  using 
this  approximation  is  that  a  C.R.  Bound  for  the  perspec¬ 
tive  projection  would  probably  be  uninterpretable,  and  the 
C.R.  Bound  based  on  the  orthographic  projection  model 
probably  captures  the  behavior  of  the  C.R.  Bound  based 
on  the  perspective  projection  very  well. 

2)  Though  the  surface  estimation  algorithm  in  this  pa¬ 
per  uses  the  orthographic  projection  model,  with  trivial 
changes  the  approach  applies  to  the  perspective  projection 
model.  However,  with  the  perspective  projection,  the 
transformation  from  image  1  to  image  2  is  such  that  there 
is  no  longer  a  computational  advantage  to  computing  the 
gradient  (15)  directly.  Rather,  we  compute  the  gradient 
by  computing  directional  derivatives  by  evaluating  (14)  at 
the  appropriate  points.  The  estimation  accuracy  is  good, 
but  the  required  computation  becomes  a  few  times  larger 
than  is  the  computation  based  on  the  orthographic  projec¬ 
tion.  The  perspective  projection  model  is  used  in  the  sur¬ 
face  reconstruction  experiment  in  Fig.  17  where  the  data 
used  are  a  sequence  of  real  images  taken  by  a  CCD  cam¬ 
era  mounted  on  a  moving  robot  arm. 

3)  A  major  use  of  the  C.R.  Bound  is  an  understanding 
of  the  relative  importance  of  various  factors  on  accuracy. 
The  way  image  gradient,  camera  geometry,  and  3-D  sur¬ 
face  parameter  dependence  enter  (37)  appears  to  be  very 
fundamental  and  should  prove  useful  even  when  the  true 
model  differs  somewhat  from  the  assumed  one.  However, 
it  is  possible  to  derive  exact  C.R.  Bounds  for  other 
models.  For  example,  if  the  images  have  low  noise,  im¬ 
age  quantization  into  pixels  can  introduce  a  sizable  noise. 
If  at  a  point  the  image  intensity  varies  by  Q  units  over  an 
interval  equal  to  the  extent  of  a  pixel,  then  the  image 
quantized  into  pixels  can  have  an  equivalent  noise  fluc¬ 
tuation  of  Q/2  at  the  pixel  in  question.  This  noise  is  im¬ 
age  dependent  and  is  usually  correlated.  A  C.R.  Bound 
can  be  derived  for  this  case. 

4)  Accurate  3-D  surface  reconstruction  requires  the  use 
of  dependency  among  points  within  sizable  regions  of  the 
surface.  An  effective  way  of  imposing  such  dependence 
is  by  using  prior  knowledge  of  3-D  surface  structure. 
Since  planes,  spheres,  and  cylinders  are  effective  in  effi¬ 
ciently  modeling  most  man-made  objects  [30],  we  have 
emphasized  them.  The  more  a  priori  information  one  uses 
in  surface  reconstruction  (or  for  that  matter  in  any  infer- 
encing  problem),  the  more  accurate  will  be  the  result.  And 
maximum  accuracy  is  obtained  by  using  all  of  this  a  priori 
information  at  the  time  of  processing  the  raw  data. 

5)  We  do  not  have  the  entire  system  sufficiently  well 
calibrated  in  order  to  determine  the  exact  surface  recon¬ 
struction  accuracy  based  on  real  data.  One  reason  for  run¬ 


ning  the  algorithm  on  the  semi-artificial  data  used  in  the 
experiments  in  the  paper,  is  that  we  can  report  this  accu¬ 
racy  exactly.  The  images  used  in  our  experiments  had 
low  sensing  noise,  so  the  existing  noise  was  largely  col¬ 
ored  image-dependent  noise  due  to  pixel  quantization.  As 
seen,  our  algorithms  work  well  in  the  presence  of  this 
noise.  Moreover,  recent  experiments  (e.g.,  Fig.  17)  in¬ 
dicate  that  the  reconstruction  works  very  well  with  real 
data  too.  From  these  experimental  results,  we  feel  that 
this  algorithm  is  robust  to  deviations  from  the  assumed 
model  that  may  be  encountered. 

6)  Our  performance  functionals  can  be  multimodal. 
Some  feeling  for  the  width  of  the  main  lobe  is  given  by 
the  curves  shown  for  the  experiments  reported  on  in  the 
paper.  The  true  value  of  the  surface  parameter  vector  aT 
is  always  at  the  global  minimum  of  the  performance  func¬ 
tional  (14)  if  the  image  noise  is  0.  If  the  noise  is  nonzero, 
the  a  at  the  global  minimum  of  (14)  converges  to  aT,  un¬ 
der  weak  conditions,  as  the  size  of  the  image  patches  used 
becomes  large.  In  statistics  jargon,  the  mle  is  consistent. 
However,  the  performance  functional  can  be  almost  flat 
in  the  vicinity  of  its  global  minimum.  The  shape  of  the 
functional  there  depends  on  the  image  intensity,  surface 
geometry,  and  camera  positions.  The  interpretation  in 
Sections  III-B  and  III-C  of  the  C.R.  Bound  provides  in¬ 
sight  into  what  determines  how  flat  or  steep  the  perfor¬ 
mance  functional  is.  One  of  the  factors  that  greatly  affects 
the  width  of  the  main  valley  of  the  performance  functional 
is  the  angle  between  the  optical  axes  of  the  two  cameras. 
The  width  decreases  with  increasing  angle.  This  property 
is  exploited  to  provide  our  computationally  simple  algo¬ 
rithm  of  Section  II-D.  We  begin  with  a  small  angle,  so 
that  we  can  use  an  arbitrary  first  guess  for  the  surface  pa¬ 
rameter  vector,  and  then  minimize  the  unimodal  perfor¬ 
mance  function.  The  resulting  estimate  of  a T  is  highly  in¬ 
accurate,  but  it  is  accurate  enough  to  lie  in  the  global 
valley  of  the  performance  functional  resulting  from  use  of 
a  larger  angle.  Then,  starting  with  the  aT  estimate  found, 
a T  is  reestimated  with  this  larger  angle.  This  approach  is 
repeated  a  few  times  until  an  accurate  estimate  of  a T  is 
obtained. 

7)  Using  our  approach,  we  can  do  maximum  likelihood 
pointwise  estimation  of  3-D  surfces,  but  we  do  it  by  as¬ 
suming  a  surface  patch  model.  Then  the  mle  of  a  point  on 
the  surface  patch  is  the  point  on  the  estimated  surface 
patch  that  corresponds  to  the  image  of  the  point.  If  the 
image  point  is  s  in  image  1 ,  then  the  estimated  3-D  point 
is  (sT,  z)T  where  z  is  the  value  satisfying  g(z,  s,  a)  =  0. 
Since  mle’s  have  minimum  variance  as  data  patch  size 
becomes  large  under  fairly  general  conditions,  our  esti¬ 
mators  should  pretty  much  have  maximum  accuracy.  If 
the  object  surface  has  special  modelable  pattern  structure 
such  as  contours,  across  which  the  patterns  are  discontin¬ 
uous  and  along  which  they  are  smooth,  then  greatest  ac¬ 
curacy  is  achieved  by  using  this  information.  We  do  this 
in  [28]. 

8)  Insights  into  appropriate  image  and  surface  patch 
sizes  to  use  can  be  provided  in  terms  of  image  noise,  sur¬ 
face  curvature,  etc.,  but  this  would  require  some  lengthy 
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development.  In  brief,  if  the  image  noise  is  white,  then 
by  using  concepts  and  techniques  such  as  those  in  [12], 
[26],  it  is  possible  to  determine  approximately  the  error 
covariance  matix  for  estimating  the  parameter  vector  for 
a  surface  patch,  and  through  a  combination  of  analysis 
and  experimentation,  it  is  possible  to  estimate  the  proba¬ 
bility  of  correctly  recognizing  which  of  a  number  of  3-D 
surface  models  is  correct  for  a  certain  surface  patch.  The 
C.R.  Bound  (41a)  also  provides  insights  on  estimation  ac¬ 
curacy.  On  a  more  informal  level,  an  image  patch  size 
should  be  chosen  to  be  large  when  the  standard  deviation 
of  noise  (either  from  measurement  or  from  quantization) 
is  large.  The  more  parameters  that  must  be  estimated,  for 
a  surface  patch,  the  larger  should  be  the  data  patches  that 
are  used.  For  example,  in  the  presence  of  modest  noise, 
small  planar  or  spherical  patches  can  be  estimated  that  fit 
the  sphere  surface  well,  but  the  estimate  of  the  sphere 
radius  and  center  may  be  greatly  in  error.  In  order  to  es¬ 
timate  these  parameters  well,  it  may  be  necessary  to  use 
a  patch  covering  one  eighth  or  more  of  the  sphere  surface. 
An  unconstrained  quadric  patch  has  even  more  parameters 
to  be  estimated.  However,  accurate  estimates  of  curvature 
can  be  obtained  with  small  patches  if  a  sequence  of  im¬ 
ages  is  used  [27]. 

9)  We  see  two  ways  for  estimating  large  complex  sur¬ 
faces  from  small  patches.  One  approach  is  to  model  a  sur¬ 
face  as  a  stochastic  process.  This  is  done  in  [3]  where  the 
model  is  a  continuous  surface  of  planar  patches  with  the 
set  of  parameter  vector  values,  one  vector  value  for  each 
patch,  modeled  as  a  Markov  random  field  (MRF).  This 
MRF  provides  the  a  priori  distribution  (knowledge),  for 
the  global  surface,  and  indirectly  contains  information 
such  as  surface  curvature,  blob  sizes,  etc.  There  it  is 
shown  how  the  estimators  of  the  present  paper  fit  into  the 
more  global  scheme.  The  other  approach  is  to  use  the 
maximum  likelihood  clustering  approach  of  [26]  to  first 
estimate  small  surface  patches,  and  then  cluster  these  into 
large  regions  with  each  large  region  associated  with  a  sin¬ 
gle  surface  model,  e.g.,  with  a  single  smooth  surface  free 
of  gradient  discontinuities. 

As  we  point  out,  because  of  the  probabilistic  formula¬ 
tion  of  the  problem,  the  powerful  machineiy  of  Bayesian 
inference  can  be  brought  to  bear  in  our  approach.  In¬ 
cluded  here  is  approximately  Bayesian  recognition  of  the 
object  surface  shape  class  associated  with  the  block  of 
data  under  consideration.  That  is,  recognition  of  which  of 
a  sphere,  a  cylinder,  a  plane,  some  other  parameterized 
surface,  or  two  or  more  surfaces  are  associated  with  the 
data  block.  (The  asymptotic  Bayesian  recognition  meth¬ 
ods  in  [12,  Section  V]  are  directly  applicable  here.)  Also, 
the  approaches  in  the  preceding  point,  9,  are  Bayesian. 

Appendix  A 

Orthographic  Approximation  to  Pinhole  Camera 
Model 

Suppose  the  ORF  is  chosen  to  be  the  same  as  CRF1, 
and  the  2-D  image  coordinates  are  in  the  same  units  as 
are  the  3-D  coordinates.  Let  P  be  an  arbitrary  surface  point 


seen  in  image  1.  Let  r  =  (x  y  z) 7 be  the  3-D  coordinates 
of  P  with  respect  to  the  ORF,  and  s  =  ( sx  sy)T  be  its 
corresponding  2-D  image  coordinates  in  image  1.  Then, 
upon  using  the  homogeneous  coordinate  transformation, 


That  is, 


sx  *  VV 
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where /is  the  camera  focal  length.  Let  r0  =  (x0  yo  £o)r 
denote  a  3-D  point  seen  in  the  image.  A  Taylor  series 
expansion  about  r0  is 


s  =  n(r)  =  n(ro)  +  Yr  ^  *  ( r  ~  r + 


— 1 

£ 

1 _ 

0 

>? 

o 

tv 

1 

+ 

1 

o 

(/-  Zo)2 

/•  yo 

1 

o 

/ 

/•  yo 

L/-  ZoJ 

o 

tv 

1 

(/-  Zo)2_ 

X  -  *0 


y  -  yo 


+ 


l_z  “  Zo 


Therefore,  we  can  approximate  s  as  a  linear  function  of 

y,  z 
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This  approximation  is  quite  accurate  for  \z  —  Zol/I/  ~~ 
Zo  |  much  smaller  than  1 .  The  optical  axis  of  the  camera 
is  the  ray  x  =  y  =  0.  Suppose  (x,  y,  z)  is  a  point  on  an 
object  surface,  and  let  jc0,  y0  be  the  center  of  an  image 
window. 

If  the  object- to-camera  distance  is  at  least  a  few  focal 
lengths,  i.e.,  |z0//l  »  1;  the  object  diameter  to  object- 
to-camera  distance  is  small,  i.e.,  \z  ~  Zo\/\f  —  Zq\  « 

1  such  that  |*o(£  —  Zo\/\f~  Zo)  |  «  M  and  |y0(z  -  z0)/ 
(/  —  Zo)  I  «  1  >' ! ;  then  (a2)  is  a  good  approximation  to 
(al)  within  the  window. 

s-*/7 7/  ’’-fH’-  <a2) 

If  this  s*  and  this  s>;  are  normalized  by  dividing  by  the 
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known  scale  factor  //(/  -  z0),  the  resulting  coordinates 
of  the  image  point  are  the  3-D  point  coordinates  x  and  y, 
respectively.  This  is  the  orthographic  imaging  model  used 
in  the  body  of  the  paper. 

Appendix  B1 

Derivation  of  the  C.R.  Bound 

We  briefly  derive  the  C.R.  Bound  for  aT.  We  begin  with 
(26a). 

d  In p(Iu  I2\a)  l  ,  r 

+  [I2{u)  -  #i(<)]«2(0}  (bl) 

where  bk{t)  takes  values  1  or  0  according  to  whether  or 
not  the  point  t  is  seen  at  a  pixel  in  Dk.  Hence,  =  1 
for  all  t  =  q{(s,  a),  s  e  Du  and  d2(t)  =  1  for  all  t  such 
that  t  =  q(u ,  b ,  a),  u  e  D2.  Furthermore,  suppose  the 
pixel  at  s  in  Dx  maps  onto  two  or  more  pixels  centered  at 
points  u  in  D2.  Then  we  will  consider  that  pixel  center  u 
that  is  closest  to  h(s,  b,  a)  to  be  such  that  q^s ,  a )  =  t 
=  q(u,  b,  a),  and  for  this  t ,  5\(t)  =  d2(t)  =  1.  A  similar 
statement  applies  if  the  pixel  at  u  in  D2  maps  onto  two  or 
more  pixels  centered  at  s  in  Dx.  From  (bl), 

AM  =  M(s)  &iM  +  h(u)  ^Ml/I^iM  + 

(bl) 

From  (b2)  it  is  seen  that  / u(t)  is  an  unbiased  estimate. 
Continuing,  we  obtain 

3  In  p(Iu  I2  |  a) 


d\  +  d2 


S  [/,(*)  -  /*,(*)] 

seD\ 


+  2  [/2(k)  -  n2(u)\  . 

ueD2  ) 


Hence, 


srr*  1  -  *<•>] 


+  2  [i2(u)  -  h(u)] 

ueD2 


+  S  [/,(«)  -  ,2(u)}  ^ 

ueD2  0a 


As  seen  from  Section  II,  there  is  not  a  simple  explicit 
expression  for  a.  Rather,  a  must  be  obtained  numerically 
by  minimizing  (14),  or  equivalently,  (27).  However,  it  is 
easy  to  show  that  a  is  unbiased  for  3-D  planar  surfaces, 
and  is  asymptotically  unbiased  for  more  general  but  rea¬ 
sonably  smooth  3-D  surfaces. 

The  negative  of  the  expectation  of  the  second  deriva¬ 
tives  is  now  computed. 


d2  In  p(Iu  I2 1  a) 


dn(t)  dfi(t')  J 

f  0  if  t'  *  f; 

1  ^  [*i(0  +  «z(0]  i U’=t. 
d2  lnp(/1;  /2|a)~|  di  +  d2 


L  (d°2)  J 

If  dn  =  d{  =  d2 ,  (b6)  becomes  d{a~4 . 

d2  In p(IuI2\*T) 
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fyjqi M  «r))  Or)) 

daT  daT 


dnjqju,  ft,  ar))  ft,  aT)) 

daT  daT 


where  p\(s)  =  ix(qx(s,  a))  and  p2 (u)  =  p(q(u,  b ,  a)). 
From  (b3),  we  see  that  b2  is  asymptotically  unbiased  as 
Di  and  D2  become  large  if  and  only  if  Dx  ~  Du  where 
Dn  is  defined  in  Section  Il-E. 

ainp(/„/2|o)  1  f  v  .  ,^3/Us) 

,  V  d^2(u)'\ 


There  are  cross  terms.  These  are  seen  by  direct  com¬ 
putation  to  be 

J>ln  p(Iu  I2\aT)]  n 

~E  - - -jr -  =  0  (b8) 

oo juaj 


d2\np(Iu  I2\aT)l  l  (  dp(q}(s,  aT)) 


dpT(t)  daT 


d/i(g(u,  ft,  aT)) 

+ - bTt - 62(0  ' 


Finally,  note  that  - E[d 2  In p(I\,  I2  \  aT)/dfiT{t)  do2T ]  = 
0. 

From  the  preceding,  the  Fisher  Information  Matrix  has 
the  form 
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where  L  is  the  number  of  those  pixels  on  the  object  sur¬ 
face  that  are  seen  in  at  least  one  of  the  Dx  and  D2, 


=  —E 


fo  "  E 


d2  In  p(Iu  I2\ut) 

( dfiT(ti )) 

a2  In p(I\,  h\ <*r) 
(dolf 


,  1=1,  2, 


L, 


and  for  aT  having  K  components, 
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and  H  has  r'th  row 
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(bl2 ) 


Our  primary  interest  is  in  finding  a  lower  bound  for  the 
error  covariance  matrix  for  estimating  dj.  We  make  use 
of  the  following  known  matrix  inversion  result 


Summing  s  over  h~\D2 ,  b ,  aT)  in  the  last  summation  in 
(bl2)  is  equivalent  to  summing  u  over  D2.  Equation  (bl2) 
simplifies  to 


F~l  = 
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G  =  Rt,  Q  =  Q, 


so  that 


F  = 


S  I 

g  I  e 


there  results 

ET1  =  (G  -  gs-'j?) 


Note  that  due  to  cancellations,  only  the  subset  D12  of  D\ 
contributes  to  the  final  expression  for  the  C.R.  Bound, 
(bl2a).  [ Dn  is  defined  preceding  (35).]  This  is  as  ex¬ 
pected  since  data  points  cannot  contribute  to  the  estima¬ 
tion  of  d  unless  there  is  a  point  in  each  image  that  is  due 
to  the  same  point  on  the  object  surface.  The  right  side  of 
(bl2a),  the  Cramer-Rao  Bound  for  the  estimation  error  for 
the  parameter  vector  aT,  is  interesting  and  is  interpreted 
in  Section  III-C. 
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We  emphasize  that  the  surface  pattern  model  used  here 
is  that  for  which  the  reflectance  at  each  surface  point  is 
an  a  priori  unknown  arbitrary  parameter.  For  this  model, 
the  only  data  that  contributes  to  surface  reconstruction  are 
pairs  of  points,  one  point  in  each  image  where  each  point 
in  a  pair  is  an  image  of  the  same  point  on  the  object  sur¬ 
face.  If  other  surface  pattern  models  are  used,  such  as 
contour  polynomial  models  [28],  or  Markov  random  field 
texture  models  [29],  then  even  if  a  surface  point  is  seen 
in  only  one  image,  the  data  point  does  contribute  to  the 
3-D  surface  reconstruction. 


Appendix  B2 

Proceeding  similarly  to  the  derivation  of  (bl2a),  in  this 
appendix  the  C.R.  Bound  for  aT  is  found  in  terms  of  p2{ u ) 
when  ptj(s)  is  assumed  to  be  the  true  a  priori  unknown 
pattern  parameters. 

We  introduce  a  few  identities  that  are  useful  for  manip¬ 
ulating  the  expressions  that  must  be  dealt  with  in  these 
multi-image  problems.  First,  how  are  the  spatial  gradients 
of  two  images  related?  Since  p(q{u,  b ,  a))  =  p2(u) 
where  t  —  q(u ,  b ,  a),  it  follows  that 

dn2(u)  dfi(t)  dq(u,  b,  a) 

~~du~  =  ~dt  du - '  (bl3) 


Similarly,  since  px(h  \u ,  b ,  a))  =  ju2(w)  where  s  = 
h~\u,  b ,  a),  it  follows  that 


d/x2(w)  _  dp^s)  dh  \u ,  b,  a) 
du  ds  du 


(bl4) 


Similar  expressions  exist  when  qx(s ,  a)  and  h(s,  b ,  a) 
are  used  in  place  of  q(u,  b>  a)  and  h~{(u,  b,  a),  respec¬ 
tively.  Finally,  since  h~l(u,  b ,  a)  =  qx\q(u,  b ,  a),  a ), 
and  t  —  q\(qi\t ,  a),  a),  it  follows  that 

dfr\u,  b,  a)  _  dq£\t)  dqju,  b,  a)  a ) 

da  dt  da  da 


(bl5) 

and 

jfc  _  0  _  a)  9gr‘(<,  a)  dg,(s,  «) 

da  ds  da  da 


Note  that  when  computing  d^2(w)/da7,  the  dependence 
of  p2( u )  on  ar  is  through  u  -  h(  s,  b ,  aT),  where  s  e  DX1. 
Also,  we  have  used  the  fact  that  dpx(s)/daT  —  0  for  all 
s  since  the  px(s)  are  assumed  to  be  the  true  pattern  param¬ 
eters.  Furthermore,  -E[d2  In  p(Ix,  I2  \  aT) / daTd\ix  ]  is  a 
matrix  having  row 


dfii(h  '(«,  b,  aT)) 


°T 


daj 


(bl8) 


associated  with  point  s  =  h  l(u,  b,  aT ).  Upon  using  this 
and  (b!7)  in  equations  similar  to  (blO)  and  (bl  1),  we  find 


Fa'  =  2  ol 
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ueh(D\2,b,aT ) 


da-r 


fy2(«) 


da t 


The  explicit  dependence  on  s  is  exhibited  in 


(b  19 ) 


F~ 1  =  2  a 


dn2{h(s,  b,  aT)) 


daT 


d/x2(/?(<s,  b,  ^7’)) 
daj 


(b20) 


If  we  wish,  from  (b2),  dp2(h(sy  b ,  aT))/daT  can  be  ex¬ 
pressed  in  terms  of  px(s)  by  (dpx(s) / ds)(dh(s,  b , 
aT)/ds)(dh(s ,  b ,  aT)/daT ). 


Appendix  C 
The  Plane 

We  derive  the  expression  for  the  vector  dz/da  for  a 
plane.  Note  that  there  are  a  number  of  different  sets  of 
parameters  that  can  be  used  for  representing  a  plane  (or  a 
cylinder,  or  a  more  general  surface).  We  use  the  canonical 
parameterization  in  this  section.  We  use  the  equation 

0  =  g(x,  y,  z)  =  1 6\X  +  02 y  +  0iZ  -  d  (cl) 


(bl6) 

respectively. 

We  begin  with  (26).  Here,  the  first  summation  is  not  a 
function  of  a,  only  the  second  summation  is.  Because  p2 
is  related  to  px  through  fi2(h(s ,  b ,  aT))  =  (s),  and  the 

px(s)  are  limited  to  s  e  Du  it  follows  that  the  only  values 
of  s  and  u  that  contribute  to  the  C.R.  Bound  are  s  e  Dn 
and  u  e  h(Dn ,  b ,  aT)  where  D12  is  defined  preceding 
(35).  Proceeding  as  for  (b7),  we  find 
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(daTf 
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3^2(«) 

daT 
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subject  to  the  constraint 

0  =f(x,y,z)  =  0\  +  0\  +  0\  -  1.  (Cl.l) 

Note,  |  d  |  is  the  distance  from  the  plane  to  the  origin  in 
this  representation.  It  is  assumed  that  the  plane  is  in  gen¬ 
eral  position,  because  if,  e.g.,  =  0,  then  the  plane  nor¬ 

mal  is  orthogonal  to  the  first  camera’s  optical  axis,  and 
the  plane  surface  is  not  seen  by  the  first  camera  since  the 
camera  then  sees  only  the  plane’s  edge.  Equation  (cl.l) 
can  be  used  to  solve  for  03  in  terms  of  fix  and  /32.  Hence, 
we  can  take  a  to  be  ar  =  (0i,  /32,  d).  Now  dz/da  = 
~((dg/da)/(dg/dz)).  Using  (cl.l),  we  get  dh/dfr  = 
-(O//3f3,)/O//d03))  =  -2/3,/2/33  =  -(fl./fr)- 
Similarly,  dfi}/d02  =  —((df/d02)/(df/d03))  — 

~~(0i/ 03>-  Hence,  dg/dz  =  03 
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dg 

da 


Thus, 


0g  903  _  0i  _  03*  -  01  z 

00,  _  ^  00,  X  Z  03  03 

9g  903  _  03 y  -  02z 

002  y  Z  002  03 


9z  _  / giZ  -  03*  02?  -  03y  J_\ 
9«  \  ^  ’  03  ’03/ 


(c2) 


Appendix  D 
The  Cylinder 

The  canonical  parameterization  here  is 
0  =  g(x ,  y,  z)  =  {x  -  x0f  4  (y  -  y0f  4  (z  -  z0f 
—  (axx  4  a2y  4  «3z)2  —  R2 

&\Xq  4  a2y0  4  a3Z0  =  0 

a]  4  a\  4-  ol\  =  1.  (dl) 

The  unit  vector  (ofj,  a2,  0*3)  7  is  in  the  direction  of  the 
cylinder  axis,  and  the  second  equation  in  (dl)  forces  the 
point  (x0,  y0,  zq)  to  be  the  point  on  the  cylinder  axis  that 
is  closest  to  the  origin.  Because  of  the  third  equation  in 
(dl),  we  take  two  of  the  three  at  to  be  independent  param¬ 
eters,  and  because  of  the  second  equation  we  take  two  of 
jc0,  y0,  Zo t0  be  independent  parameters.  Hence,  we  choose 
a  =  (*o,  ?o>  <*2,  R)T-  Then 

dot3/dax  =  -a{/ot3,  da3/da2  =  —a2/a3 

p-  =  -2(*  -  *0) 

0*0 

p-  =  -2 (y  -  y0) 

9y  0 

-2(°,*  +  «.)’  +  «»4  +  *§) 


Appendix  E 

The  General  Quadric  Surface 
The  general  quadric  is  given  by 

0  =  g(x9  y,  z)  =  aux2  4  2 anxy  4  2 anxz 

4  a22y2  +  2  a23yz  4  a33z2  4  2a4Xx 

4  2a42y  +  2a43z  4-  <344.  (e0 

As  can  be  seen  in  (el),  the  parameter  values  for  a  surface 
are  not  unique,  as  multiplication  of  all  coefficients  by  the 
same  arbitrary  constant  leaves  the  equation  unchanged. 
Hence,  a  constraint  must  be  imposed.  One  that  results  in 
quadric  surface  estimation  that  is  invariant  to  the  choice 
of  origin  location  and  axis  orientation  for  the  coordinate 
system  used  for  describing  the  object  is  [1] 

0  =  a2\  4-  2a]2  4-  2ai3  4-  a22  4  2a23  4  a33  -  2. 

(ell) 

Using  (e  1.1),  we  arbitrarily  choose  the  dependent  param¬ 
eter  to  be  a33.  We  assume  that  at  least  one  of  a13,  a23,  fl33> 
and  a43  is  nonzero.  Otherwise,  the  surface  would  not  be 
seen  as  we  would  have  a  ruled  surface  parallel  to  the  z- 
axis.  Let  aT  =  (au,  an ,  al3,  a22 ,  a23,  a4 1#  a42l  a43,  a44). 
Upon  using  (el)  and  (el.l),  we  obtain 

dg 

t-  —  2a33z  4-  2a43 
dz 


dg 

dau 


=  jc2. 


dg  dg 

- —  =  2xy,  - —  =  2xz 
dax2  da  i3 


-^-  =  y\-^-  =  2yz,^-  =  z2 

da22  -  da23  y<"  9a  33 


dg 

0a4, 


=  2x, 


dg_ 

0«42 


,  9g  -  9g 

=  2?>  a —  =  2Z,  a —  = 
da43  aa44 


Use  of  the  preceding  and  dz/da  =  -  ( ( dg/a )  /( dg/dz ) ) 
leads  to 


dz 

da 


-(2a33z  4  2a43) 

•  ( x 2,  2;ty,  2yz,  y2,  2yz,  2x,  2 y,  2z,  1).  (e2) 


Note  that  a33  can  be  obtained  by  solving  (el.l),  and  z  can 
be  obtained  by  solving  (el). 


dg  .  .  /  da3 

—  =  ~2(ocxx  4  a2y  4  a3z)(  y  4  z~ 

dg 


a/? 

dg 


=  -2/? 


—  =  2(z  ~  Z0)  ~  2a3(alx  4  ot2y  4  a3z). 
dz 

Denote  c  =  [(z  ~  Zo)  ~~  ot3(a{x  4  a2y  4  or3z)],  and  d 
=  axx  4  a2y  4  a3z.  Then 


dz 

da 


=  ((*  -  *0 )/c,  (y  -  y o)/c,  rf(a3*  -  aiz)/c, 


rf(a3y  -  a2z)/c,  «/c). 


(02) 
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